Making a well-sampled MFCC space for audio querying

rodrigo.constanzo · December 31, 2023, 7:13pm

I’ve not watched all of the new videos, but @tedmoore did an absolute massive dump of workshop videos a few days ago which may be of interest. These in particular:

(p.s. @tedmoore the thumbnail of this gives me Max anxiety…)

There’s also the classic vid by @jamesbradbury:

It’s been my experience that dimensionality has no (essentially) impact on speed. A KDTree is fast/optimized such that whether you’re looking through 13d or 250d data, it makes little difference.

This is, ultimately, where the secret sauce lies. Finding the descriptor space that makes the most for what you want to display/browse/represent. Be it a “simple” 2d thing ala CaTaRT (loudness/centroid) or a more complex/processed MFCC space, or a blend of things (LTP, LTEp).