Ok, I got to some coding/tweaking and made a lofi “automatic testing” patch where I manually change settings and start the process, then I get back quantified results (in the form of a % of accurate identifications).

I’m posting the results here for posterity and hopeful usefulness for others (and myself when I inevitably forget again).

My methodology was to have some training audio where I hit the center of the drum around 30 times, then the edge another 30-40 times (71 hits total), then send it a different recording of pre-labeled hits. These were compared to the classified hits and a % was gotten from that.

Given some recent, more qualitative, testing I spent most of my energy tweaking and massaging MFCC and related statistical analyses.

All of this was also with a 256 sample analysis window, with a hop of 64 and `@padding 2`

, so just 7 frames of analysis across the board. And all the MFCCs were computed with (approximate) loudness-weighted statistics.

//////////////////////////////////////////////////////////////////////////////////////////////////////////////////

To save time for skimming, I’ll open with the recipe that got me the best results.

* 96.9%*:

13 mfccs / startcoeff 1

zero padding (256 64 512)

min 200 / max 12000

mean std low high (1 deriv)

//////////////////////////////////////////////////////////////////////////////////////////////////////////////////

As mentioned above, I spent most of the time playing with the `@attributes`

of `fluid.buffmfcc~`

, as I was getting worse results when combining spectral shape, pitch, and (obviously) loudness into the mix.

I remembered some discussions with @weefuzzy from a while back where he said that MFCCs don’t handle noisy input well, which is particularly relevant here as the Sensory Percussion sensor has a pretty variable and shitty signal-to-noise ratio as the enclosure is unshielded plastic and pretty amplified.

So I started messing with the `@maxfreq`

to see if I could get most of the information I needed/wanted in a smaller overall frequency range. (still keeping `@minfreq 200`

given how small the analysis window is)

* 83.1%*:

20 mfccs / startcoeff 1

zero padding (256 64 512)

min 200 / max5000

mean std low high (1 deriv)

* 81.5%*:

20 mfccs / startcoeff 1

zero padding (256 64 512)

min 200 / max8000

mean std low high (1 deriv)

* 93.8%*:

20 mfccs / startcoeff 1

zero padding (256 64 512)

min 200 / max10000

mean std low high (1 deriv)

* 95.4%*:

20 mfccs / startcoeff 1

zero padding (256 64 512)

min 200 / max12000

mean std low high (1 deriv)

* 92.3%*:

20 mfccs / startcoeff 1

zero padding (256 64 512)

min 200 / max14000

mean std low high (1 deriv)

92.3%:

20 mfccs / startcoeff 1

zero padding (256 64 512)

min 200 / max20000

mean std low high (1 deriv)

What’s notable here is that the accuracy seemed to improve as I raised the overall frequency range with a point of diminishing returns coming at `@maxfreq 12000`

. I guess this makes sense as it gives a pretty wide range, but then ignores all the super high frequency stuff that isn’t helpful (as it turns out) for classification.

//////////////////////////////////////////////////////////////////////////////////////////////////////////////////

I then tried experimenting a bit (somewhat randomly) with adding/removing stats and derivatives. Nothing terribly insightful from this stream other than figuring out that 4 stats (`mean std min max`

) with 1 derivative of each seemed to work best.

* 92.3%*:

20 mfccs / startcoeff 1

zero padding (256 64 512)

min 200 / max 12000

mean std (no deriv)

* 93.8%*:

20 mfccs / startcoeff 1

zero padding (256 64 512)

min 200 / max 12000

mean std (1 deriv)

* 90.8%*:

20 mfccs / startcoeff 1

zero padding (256 64 512)

min 200 / max 12000

all stats (0 deriv)

* 90.8%*:

20 mfccs / startcoeff 1

zero padding (256 64 512)

min 200 / max 12000

all stats (1 deriv)

Then, finally, I tried some variations with a lower amount of MFCCs, going with the “standard” 13, which led to the best results, which are also posted above.

* 83.1%*:

13 mfccs / startcoeff 1

zero padding (256 64 512)

min 200 / max 5000

mean std low high (1 deriv)

* 96.9%*:

13 mfccs / startcoeff 1

zero padding (256 64 512)

min 200 / max 12000

mean std low high (1 deriv)

For good measure, I also compared the best results with a version with no zero padding (i.e. `@fftsettings 256 64 256`

), and that didn’t perform as well.

* 93.8%*:

13 mfccs / startcoeff 1

no padding (256 64 256)

min 200 / max 12000

mean std low high (1 deriv)

So a bit of an oldschool “number dump” post, but quite pleased with the results I was able to get, even if it was tedious to manually change the settings each time.