In many discussions @weefuzzy (and others) have brought up how fragile MFCCs can be in the face of noise, with suggestions of using higher order melbands in a similar manner. My tests with this were never super promising, as I always seemed to get better results out of 20MFCCs vs even 40 melbands (for vanilla classification/matching).
The reason I’m making this thread now is that I’m finding that MFCCs are struggling a bit with regards to finding the exact match from a bunch of examples, particularly when being followed by UMAP-ing.
I’m wondering if this is an issue of file analysis vs real-time onset detection sometimes having +/- a few samples on the same (or very similar) sounds. I’m trying to mitigate this by running my offline analysis through the same onset detection algorithm (with a bit of preroll on al the files), but the results still aren’t great.
I’m not getting great results from melbands either (a separate discussion/thread perhaps), but my question here is with regards to what happens to MFCCs, and specifically dimensionally reduced MFCCs if there are small differences in phase? Since it’s an “FFT of an FFT” which is then getting shoved through some crazy manifold thing, that seems to me that having even the same audio file +/- a few samples could create radically different values.
AND / OR
Is there a way to make MFCCs (potentially when being used with subsequent dimensionality reduction) more robust with regards to minor changes in attack/onset?