A bit of a bump here, although on a different course of discussion.
When I first made this thread, I wasn’t sure what approach was best to take with what I wanted to do (use small windows to predict bigger windows, to then use as matching criteria), but now I think I have a better handle on it.
Rather, I had a better handle on it.
My work at the moment has been trying to build a big enough fluid.kdtree~
such that a tiny (256) analysis window could be matched to the nearest longer (4410) analysis window, to then combine those two together to find the nearest match, but faster.
I knew a classifier wasn’t what I was after as I was going to have hundreds/thousands of individual hits which may or may not repeat or may or may not be similar. I wanted to have a large pool of “most of the sounds I can make with my snare”.
At the last geekout, as @tremblap was explaining why my regressor wasn’t converging we ended up on a tangent (prompted by @tedmoore’s questions) which brought me back to thinking about using a regressor for this purpose.
So I’ve been thinking about this a bit, but I’m kind of confused as to what numbers I should have on each end.
So for the input, I want to have enough descriptors/stats to have a well defined and differentiated space, as the primary features. And at the output I would then want to have (potentially) musically meaningful descriptors/stats which would then be used to query a fluid.kdtree~
. In reality I would probably still want to take info from both since the 256 would be “real” and the 4410 would be “predicted” (with some error). So I can kind of wrap my head around this a bit. I guess there may be an asymmetry to things as the regressor (as far as I understand) doesn’t care about the types of data on each end. So I could potentially have a very small/tight set of descriptors/stats going in, and a much broader set of descriptors coming out.
So that asymmetry is a bit of a headfuck in terms of it being loads of variables to try/test with fragility at each possible test.
But where I have a more concrete question is about the nature of the numbers that will be interpolated. Say I have a descriptor space with loudness/pitch/centroid, and then interpolate between points. I would imagine it wouldn’t be perfect, but I could see a regressor “connecting the dots” in a way that’s probably useful and realistic. But if I have a bunch of MFCCs, or even worse, MFCCs/stats that have been UMAP’d, will the interpolation between these points potentially yield anything “real”? As in, if I have more abstract features on the output side of the regressor training, will that lead to useless data when interpolating between points?