AudioGuide (talk)

I think the point is less to do with the invariance of (higer quefrency) MFCCs to the energy of the frame, and more to do with the contribution they make to our perceptual impression of the whole. Consider that for, say, a percussive sound, you could have a lot of very low amplitude frames in the decay that are essentially mush, and don’t figure as much in our experience of that sound. So, by weighting the means, essentially dumb matching processes are more likely to pull out things that feel similar to our ears.

@tutschku’s Mfcc comparison thread demonstrates this, insofar as he’s getting audibly more satisfying matches from Orchidea (which uses a weighted mean) than from the vanilla mean via fluid.bufstats~.

2 Likes