Curious how to deal with this when it comes to secondary/tertiary stats (skewness of the 2nd derivative of loudness, etc…). With things like perceptual descriptors, or to a certain extent MFCCs, the scale and boundaries are semi-sensible, just different from each other whereas some other stuff gets fairly irrational right out of the gate.
I primarily ask as one of the main things I’d be interested in “representing” is a sense of morphology, so derivatives and related stats I imagine being of significance in capturing that.
I’m curious what makes for a good set of labels. Last year I made some training/testing hits with my prepared snare stuff so that I can test/compare, which is what I used for all the stats/verification I did in this thread. I tailored it to my intended use case where I have single/individual “training” hits, and then 5 separately recorded “testing” hits, since my approach (at the time) was more about building a kdtree with loads of individual hits which would act as “classes” which the knearest would give me the nearest match for. I did it because I wanted an arbitrary amount of “classes”, and only wanted a single example of each.
For the purposes of testing I created 34 of these (as in, 34 individual hits, and then 5 addition hits of each 34, so 170 “testing” hits).
But I guess in general it’s good to have samples where the names of the files have symbolic meaning? (instrument, pitch, dynamics typically).
If so, that makes me think I should be able to harvest some great test corpora from some of the sample libraries I have where I can break off sections of it based on instrument (“spoons” vs “crowbar” etc…).
I have it on my list of things to test again, just to revisit the speed “problem”, but curious on your thoughts/experience with regards to pre-fit PCA vs MLP as part of a process chain with regards to how fast they are. I guess in both cases they get broken down to multiplications/calculations, so it’s just a matter of passing the data around.
As @tedmoore mentioned, having some legibility, or still perceptually-meaningful descriptors in the mix is definitely desirable as you can then do perceptual querying ((loudness > -10 && centroid >= 20) && knearest
) which would be impossible if you boil everything down to abstract/reduced descriptors. But at the same time, I’m not completely averse to doing some overall descriptor mashing (if it isn’t (much) slower than not doing it).