Are you still using MFCCs in all this? I wonder if they will be the most effective feature for tracking the change in tone across a drum. Thinking out loud, the spectral envelope of a drum strike might be dominated by some resonance that shift in frequency, but don’t change shape or distribution very much.
What happens if you useda small number of mel bands as your feature instead? Does the discrimination get better or worse?
We should try and set up some clustering examples with kmeans…