I returned to this testing today, with the realization that I can’t have > 9 dimensions (with my current training set) due to how PCA works.
I’ve changed up the analysis now too and am taking 25 MFCCs (and leaving out the 0th coefficient), as well as two derivatives. So an initial fluid.dataset~
with 196 dimensions in it. So a nice chunky one…
I tested seeing how small I can get things and have OK matching.
196d → 2d = 28.9%
196d → 5d = 41.9%
196d → 8d = 44.1%
196d → 9d = 54.0%
Not quite the 67.05% I was getting with a 12d reduction (which is surprising as a couple of those dimensions would be dogshit based on the pseudo-bug of requesting more dimensions than points).
I also made another training set with 120 points in it, but I can’t as easily verify the validity of the matching since it’s essentially a whole set of different sounding attacks on the snare. So I’ll go back and create a new set of training and testing data where I have something like 20 discrete hits in it, so I can test PCA going up to that many dimensions.
I’ll also investigate to see what I did to get that 67% accuracy above, to see if I build off that. But having more MFCCs and derivs seems promising at the moment, particularly if I can squeeze the shit out of it with PCA.
edit:
It turns out I got 67.05% when running the matching with no dimensionality reduction at all. If I do the same with the 196d variant (25MFCC + 2derivs) I get 73.9% matching out of the gate. So sans reduction, this is working better.