Regression + Classification = Regressification?

Ok, I got to some coding/tweaking and made a lofi “automatic testing” patch where I manually change settings and start the process, then I get back quantified results (in the form of a % of accurate identifications).

I’m posting the results here for posterity and hopeful usefulness for others (and myself when I inevitably forget again).

My methodology was to have some training audio where I hit the center of the drum around 30 times, then the edge another 30-40 times (71 hits total), then send it a different recording of pre-labeled hits. These were compared to the classified hits and a % was gotten from that.

Given some recent, more qualitative, testing I spent most of my energy tweaking and massaging MFCC and related statistical analyses.

All of this was also with a 256 sample analysis window, with a hop of 64 and @padding 2, so just 7 frames of analysis across the board. And all the MFCCs were computed with (approximate) loudness-weighted statistics.

//////////////////////////////////////////////////////////////////////////////////////////////////////////////////

To save time for skimming, I’ll open with the recipe that got me the best results.

96.9%:

13 mfccs / startcoeff 1
zero padding (256 64 512)
min 200 / max 12000
mean std low high (1 deriv)

//////////////////////////////////////////////////////////////////////////////////////////////////////////////////

As mentioned above, I spent most of the time playing with the @attributes of fluid.buffmfcc~, as I was getting worse results when combining spectral shape, pitch, and (obviously) loudness into the mix.

I remembered some discussions with @weefuzzy from a while back where he said that MFCCs don’t handle noisy input well, which is particularly relevant here as the Sensory Percussion sensor has a pretty variable and shitty signal-to-noise ratio as the enclosure is unshielded plastic and pretty amplified.

So I started messing with the @maxfreq to see if I could get most of the information I needed/wanted in a smaller overall frequency range. (still keeping @minfreq 200 given how small the analysis window is)

83.1%:

20 mfccs / startcoeff 1
zero padding (256 64 512)
min 200 / max 5000
mean std low high (1 deriv)

81.5%:

20 mfccs / startcoeff 1
zero padding (256 64 512)
min 200 / max 8000
mean std low high (1 deriv)

93.8%:

20 mfccs / startcoeff 1
zero padding (256 64 512)
min 200 / max 10000
mean std low high (1 deriv)

95.4%:

20 mfccs / startcoeff 1
zero padding (256 64 512)
min 200 / max 12000
mean std low high (1 deriv)

92.3%:

20 mfccs / startcoeff 1
zero padding (256 64 512)
min 200 / max 14000
mean std low high (1 deriv)

92.3%:

20 mfccs / startcoeff 1
zero padding (256 64 512)
min 200 / max 20000
mean std low high (1 deriv)

What’s notable here is that the accuracy seemed to improve as I raised the overall frequency range with a point of diminishing returns coming at @maxfreq 12000. I guess this makes sense as it gives a pretty wide range, but then ignores all the super high frequency stuff that isn’t helpful (as it turns out) for classification.

//////////////////////////////////////////////////////////////////////////////////////////////////////////////////

I then tried experimenting a bit (somewhat randomly) with adding/removing stats and derivatives. Nothing terribly insightful from this stream other than figuring out that 4 stats (mean std min max) with 1 derivative of each seemed to work best.

92.3%:

20 mfccs / startcoeff 1
zero padding (256 64 512)
min 200 / max 12000
mean std (no deriv)

93.8%:

20 mfccs / startcoeff 1
zero padding (256 64 512)
min 200 / max 12000
mean std (1 deriv)

90.8%:

20 mfccs / startcoeff 1
zero padding (256 64 512)
min 200 / max 12000
all stats (0 deriv)

90.8%:

20 mfccs / startcoeff 1
zero padding (256 64 512)
min 200 / max 12000
all stats (1 deriv)

//////////////////////////////////////////////////////////////////////////////////////////////////////////////////

Then, finally, I tried some variations with a lower amount of MFCCs, going with the “standard” 13, which led to the best results, which are also posted above.

83.1%:

13 mfccs / startcoeff 1
zero padding (256 64 512)
min 200 / max 5000
mean std low high (1 deriv)

96.9%:

13 mfccs / startcoeff 1
zero padding (256 64 512)
min 200 / max 12000
mean std low high (1 deriv)

//////////////////////////////////////////////////////////////////////////////////////////////////////////////////

For good measure, I also compared the best results with a version with no zero padding (i.e. @fftsettings 256 64 256), and that didn’t perform as well.

93.8%:

13 mfccs / startcoeff 1
no padding (256 64 256)
min 200 / max 12000
mean std low high (1 deriv)

//////////////////////////////////////////////////////////////////////////////////////////////////////////////////

So a bit of an oldschool “number dump” post, but quite pleased with the results I was able to get, even if it was tedious to manually change the settings each time.

2 Likes