Moving this into here, so it’s not in the “public” part of the forum, and it also relates to this discussion in this thread.
I’m thinking about doing something similar-ish, or rather, trying to create macro descriptors. So chunking a bunch of loudness descriptors/stats (perhaps even per-frame) and reducing that down to 1 or 2 dimensions, doing the same with mfccs/descriptors/stats, and bringing that down etc…
For my analysis time frame, and general use case, pitch isn’t as important, so not sure what to do on that front, but the general idea being to reduce a mixed/large descriptor space into a lower amount of dimensions, but grouped by perceptually related sub categories…
Is the idea to give the KDTree at the end the same amount of entries per thing you find important (as well as scaling them appropriately)? So if the KDTree has 6 things in it, two per L, P, and T, that it should give them equal significance in finding the nearest point?