Regression + Classification = Regressification?

rodrigo.constanzo · May 10, 2022, 9:38pm

A bit of a bump here as I’ve been playing with this with the latest update. It’s profoundly easier to try different descriptor/stats combinations now.

As a reminder this is trying to differentiate between subtly different hits on the snare (e.g. snare center vs snare edge).

I’ve learned a couple things off the bat. spectral shape stuff doesn’t seem to help here at all, and loudness, although often descriptive, isn’t ideal for (timbre) classification as it isn’t distinctive enough. So far I’ve gotten the best results just using straight MFCCs ((partially)loudness-weighted), but even with the simplicity of the ‘newschool’ stuff, I spent over an hour today manually changing settings/descriptors/stats, then running a test example, and then going again.

This is an issue I had before, but process of changing the analysis parameters in the ‘oldschool’ way meant that, at best, I could test one new processing change variation in about an hour’s worth of coding. So it was all slow.

So I basically have 7 types of hits, for which I have training and example data (which I know, and can label) (e.g. center, edge, rim tip, rim shoulder, etc…). I do most of my testing on center/edge since those are the closest to each other and, as such, are the most problematic ones.

I’m wondering what would be the best way to go about to figure out what recipe/settings provide the best results, given that I know what the training data is, and what the corresponding examples are too. Max is pretty shitty for this kind of iterative/procedural stuff, so I was thinking either something like @jamesbradbury 's ftis, or I remember @tedmoore ages ago spelunking similar things in SC. My thinking is something where I can point it to example audio, labels for the training data, then labeled testing examples, and be able to find out that “these settings and descriptors/stats give the most accurate results” without having to manually tweak and hunt for them.

Where things get a bit sticky here is that, so far, most of the improvements I’ve been seeing have been coming from tweaking specific MFCC settings. Like amount of coeffs, zero padding or not, min/max freq range, and then the obvious stuff like stats/derivs. So it’s not as straight forward as “just analyze everything”, because there’s even more permutations involved when starting to change those settings too.

I was also, parallel-y thinking that PCA/variance stuff may be useful here. This is perhaps a naive assumption, but I would imagine that whichever components best describe the variance would presumably also be best for differentiating between examples for classification.

///////////////////////////////////////////////////////////////////////////////////////////

So this is partly a philosophical question which I’ve pointed at many times before where much of (from my experience at least) interfacing with ML is arbitrarily picking numbers that the computer will then tell me are no good, before sending me off to come back with better numbers.

The second part is more practical in terms of if there’s something (semi)pre-baked in ftis/SC that does this sort of auto meta-params-type thing.