Fluid.kdtree~ as entrymatcher~ replacement(ish)

So after getting something kind of useful out of fluid.knn~ in the other thread (I’m going to do some more testing tomorrow and see how I get on with some additional stuff @tremblap mentioned offline) I wanted to see what else I can make sense of from TB2.

In seeing how useful the MFCCs are for timbral matching, I was thinking about incorporating that into my onset descriptors thing when working with corpus navigation, but as a single “thing”. Meaning if I’m primarily using loudness, centroid, and flatness, I can’t just chuck 13 new numbers in that, as those would be disproportionately be weighed during the matching/querying process, not to mention that the scale of things would be all off (as per @tremblap’s APT (team ATP here!)).

Ideally I’d be able to do something like weigh loudness and centroid equally, then perhaps consider a bunch of the other spectral descriptors as a combined “item”, and then MFCCs as another “item”, plus fitting in a bunch of statistics in there too.

From this I remember @tremblap’s patch from the plenary day that did some AudioGuide-type things. In looking back at the patch, it’s not terribly clear what exactly is happening at each stage (although I did manage to get it to work!), but the core nuggets of it seem like they could be useful for “generic” querying/matching stuff.

So from the looks of it points are shoved into fluid.dataset~, which needs a name and a size (ala entrymatcher~). This could presumably be something like this:

1, "METAL RESONANCE DROPS Hard concrete ANGLE IRON 01 - 01.wav" 1956.507937 1023.095805 4 -17.815588 -0.042764 3.267899 -17.815588 2205. 0.303731 538.381897 1474.484863 7241.071289 2.690974 351.409302 7550.347168 -7.610556 0.003646 0.61819 -9.106574 20943.193359 5.410414 1321.936401 19468.511719 5551.549805 0.94741 138.995499 5434.257324;
2, "METAL RESONANCE DROPS Hard concrete ANGLE IRON 01 - 02.wav" 2202.721088 1181.382368 2 -28.928728 -0.037673 2.115414 -28.928728 1284.415771 0.42789 1061.948853 539.157104 6360.745117 1.902401 195.140411 5399.84082 -10.174833 0.002133 0.359874 -10.33946 20943.193359 4.831788 780.422302 10516.023438 4918.262207 0.614422 69.877556 4570.10791;

Then the named dataset is passed into a fluid.kdtree~, and then I can simply query it for kNearest [buffer/dataset] 1 and kNearestDist [buffer/dataset] 1 and it should return the index(?) (which I would then use to play back the matched sample/entry).

Am I on the right track here?

I guess this wouldn’t necessarily solve the problem of over weighing something like MFCCs as that would take up the same amount of dimensions in a fluid.kdtree~ as it with an entrymatcher~, which is, I supposed, where data reduction stuff will come in handy.

So is what I’m after possible at the moment, and/or is it just a question of data reduction which is independent of querying/matching?

This is on my radar too, but my first challenge is to find what I did well in LPT (loudness more than amplitude :smile: - I sanitised the data making sure I was weighting 1dB to 1semitone in a perceptual space. This is crude but gave me better results than just normalisation or standardisation (and you can hear the difference in the code I gave) What is a challenge is to replace the (bad) timbral descriptor (centroid) by a good one (MFCC) but to weigh them in the same unit of distance… I’m playing with this at the moment when I have a chance between all the other things.

Dimension reduction, and more subtle and powerful querying, is what we are working on (again in parallel) which will allow you to do some powerful processing of your descriptor space.

There are again many ways there. In LPT’s case, I have a [coll] that has all the indices of all my slices in one gigantic buffer. I use the same ID for each slice in there, and in my dataset, so I can retrieve a ‘point’ with its ‘label’ which is that number.

Owen has done other examples in the same folder where he uses dicts. I am testing new versions of that this very morning to help people build quickly datasets as they wish, should data preprocessing not be their jam.

So I would suggest exploring the distances you get at the moment with small patches, but not go yet for the big data structure replacement, since other tools are on their way… or you can do it as you want, which is useful for us (your questions are always welcome) but more painful than it’ll be in the next version I think… but maybe that iterative design gives you the opportunity to influence the design, so your input is welcome :slight_smile:

Yeah the sanitization stuff is useful. I guess dB and MIDI notes are “close enough”, so if I remember right, more of what you did was changing their ‘default’ values to not overly weigh towards one direction (centroid = middle, and pitch = low). For my purposes I’m scrapping pitch since I’ve not really been able to get a useful result out of a 512 sample window (with even smaller fft size), and since it’s percussion/drum shit, I’m not bothered either way.

MFCCs I guess are the really dirty ones in terms of range/weight/etc…

That’s good news. Will some of this be available in the next software 𝓇𝑒𝓁𝑒𝒶𝓈𝑒?

Ah right so the label is just a pointer, which you then can go manually retrieve elsewhere if you need the data contained within. (I was struck after posting this that you can’t store strings in buffers (easily), so that limits some of what can be queried, but if the data is contained/managed elsewhere, then it doesn’t matter so much, although it does add an extra step (and potentially latency) to the process.

So my take away from this, for now, is to wait, for now.

I look forward to the next set of tools (with hopefully some clearer info about the ones from the first batch, as I made very little sense of these until only very recently)

No promises here, but we are working on them for realz :slight_smile: We prefer to talk about releases than dump though…

let’s use the vocab we have for now. Each entry in a dataset is called a point, which has an ID (a unique number, hidden for now), a label (a unique string of your choosing) and a data array of width ‘cols’ (many floats in a row from a buffer for now)

so the label is a unique pointer to that data in that dataset, yes. in the LPT example, I use INTS as they allow me to refer to a parallel coll with the same ‘label’/‘id’/‘address’

1 Like