I’ve not followed this thread super closely, so I may be missing something about the specifics of your setup or approach, but it seems to me that you’re describing a classifier here, rather than dimensionality reduction.
You can obviously use dimensionally reduction as part of the recipe for classification, but in my testing/experience I got (significantly) better results without using UMAP (or PCA) first and just feeding the MFCCs + stats (104d in my “recipe”) directly into a classifier. Although it is an old (and long) thread, I go through my tests/processes/comparisons in this thread. The main outcome was that “raw” descriptors/stats worked the best, and for me it was a matter of finding the right combination of stats and freq range to get the best accuracy.
For a quick test you can use sp.classtrain
and sp.classmatch
from SP-Tools to see if that does what you want, and if so you can then refine/customize the specific descriptors that work well/better for piano.
If you don’t want to specifically define classes you can use kmeans/clustering to find however many points you like, and then use that to feed a classifier (sp.clustertrain
in SP-Tools for quick testing as well).
At the moment I’ve been experimenting with using an MLP version of the classifier which needs to be trained/converged before use, and have found the results better/faster too. (this is not implemented in the release version of sp.classtrain
, but the dev one has it built in if you want to test that.