So in light of new bits coming out soon, I’ve been thinking about how I might apply dimensionality reduction.
I’m sure a bunch more will become clear once we have the new tools, and @tremblap mentioned doing a video walkthrough of the new stuff, which is great.
So if you have a large multi-dimension descriptor space, you can apply some dimensionality reduction to it and end up with a descriptor space that is better(?!) suited for searching, or at least more efficient, or different, etc…
That makes sense.
So if I have an source and I want to find nearest matches, both of these would be (presumably?) run through the same descriptor and dimensionality reduction algorithms in a manner that (after normalization/standardization/sanitation/whatever), one could browse for one with the other.
I also gather that the way the descriptor space is processed will (also) have a large impact on the matching and such. I’m still with it here.
So where I’m a bit lost is how this would apply when working with disparate spaces, where you want minimal latency. Specifically the stuff I’ve been working on where I have multiple stages of analysis being applied to incoming audio, with varying amounts of descriptors and statistics being analyzed for each stage. How does this apply to a much larger descriptor space where time is no issue in terms of analysis.
More concretely, an example.
Snare input, super low latency, with initial analysis window of 64 samples, and perhaps a 512 sample latency overall (with a another analysis stage in there that is staggered).
And then using it to navitage a 3k+ sample library of metal sounds.
64 samples is not a lot of time, and certain descriptors don’t make sense at that time scale (i.e. pitch). So for that initial burst I mainly just do loudness, centroid, and flatness. The longer my analysis window, the more descriptors/stats I start incorporating. Meaning I have a variable amount of dimensions that I’m starting off with.
So is the general idea that I would apply dimensionality reduction to the large sample library and end up with x amount of dimensions (lets call it 3 for now) which describe the overall descriptor space.
I then have my input/real-time analysis, with far less dimensions available. Do I also then reduce that to 3 dimensions as well? Is what is significant about the real-time stuff that has been dimensionally reduced in some way mappable onto another dimensionally reduced space?
Like if my drums tend to be muffled hits, perhaps timbre isn’t massively important, whereas the opposite may be the case for the sample libraries.
Basically I’m having, conceptual, trouble figuring out how dimensionality reduction and mapping/querying of spaces work when the source/target material are very different.
Obviously all the normalization/standardization would, potentially, mitigate the differences in scale for everything, but perhaps not the significance of what the algorithm(s) have chosen to reduce down to.
And then finally, the issue of speed/latency. So at the moment, in the other thread, I’m working out a multi-stage analysis approach so the initial tiny fragment is matched, crudely, with something from the database, then moving on and on. In a dimensionally reduced ML paradigm, this wouldn’t work(?).
Is the idea that once you go into dimensionality reduction and ML querying, that it’s an all-or-nothing approach?
////////////////////////////////////////////////////////////////////////////////////////////////////////////////
So yeah, lots of conjecture and spitballing here, but already priming my brain for upcoming bits.
Main takeaway question(s), I guess, can be boiled down to:
How does dimensionality reduction work when you have (sonically) disparate spaces with varying amounts of underlying descriptors and dimensions?