Aaand a bit of classic number crunching.
Compared the MFCC and Spectral descriptors in terms of raw/straight classification (ala all the experiments from this thread) and got the following results.
The labeled musical example I gave it had only 4 classes and I ran the tests with a classifier trained on just 4 classes, or all 10 classes. Then primarily experimented with including loudness compensation (this helped my MFCC accuracy) and whether or not to include derivatives.
The base recipes are as follows:
MFCC baseline
13 mfccs / startcoeff 1
zero padding (256 64 512)
min 200 / max 12000
mean std low high (1 deriv)
Spectral baseline:
all moments / power 1 / unit 1
zero padding (256 64 512)
max freq 20000
mean std low high (1 deriv)
//////////////////////////////////////////////////////////
The results:
4 classes:
mfcc baseline - 95.8333% (my current “gold standard”)
spectral baseline - 87.5%
spectral no loudness - 87.5%
spectral no deriv - 88.88%
10 classes:
mfcc baseline - 86.11%
spectral baseline - 66.66%
spectral no loudness - 66.66%
spectral no deriv - 72.22%
//////////////////////////////////////////////////////////
So it looks like I’m probably better off without derivatives for these spectralshape things.