Here is a list of potential ideas on how to play with the LPT example provided and discussed about in yesterday’s presentation. It was not recorded and maybe some of them did not come across clearly, so this is a brainstorm. Feel free to post your additions and/or implementations and/or results in this thread
Caveat: I should have done an abstraction for the analysis process since it happens everywhere exactly the same. I am saying this because if you change parameters, you’ll need to change them in the various places, but to start exploring nearest neighbours you get, that is only in 2 places (the analysis and the top layer of nearest)
Low-hanging fruits (results within seconds)
Time-based and somehow interrelated:
changing the fft settings at the moment all descriptors run at 2048 window size and 512 hopsize. changing those could be fun, but they need to be changed also in the stats bundling (which is in hops)
stats bundling in the single message above bufstats you have 3 bangs preceded by 3 time windows. They are explained in the comment on its right.
how long is analysed at the moment there is a minimum of 200ms duration and an absolute duration of 500ms (zero paded or truncated depending on the source lenght). These are very personal biases you can change easily.
Value-based and again somehow interrelated:
the threshold of silence,
the threshold of pitch confidence
the threshold of centroid confidence.
Mid-hanging fruits (a few hours of fiddling)
One could change which stats are being bundled. At the moment they are mean, std of the values, and mean and std of their first derivative (how they change in time), for each 3 sanitised descriptors, for each time slot. That makes 4 stats x 3 descriptors x 3 timeslots = 36 values per point/entry/slice. The first thing to try would be replacing the stats by mean/std, 25%, 75% for instance, or mean, 25, 50, 75. That way you keep the same number of points, dismissing derivative but taking into account 2 statistical measurement. The ‘derivative’ could be considered almost covered by the 3-time bundling…
what is the next fruit ripeness metaphor? Blooming flowers? (a few days of fiddling)
one could explore how subsets could be done via datasetquery conditions, before finding the nearest neighbours.
one could weigh the various dimensions by multiplying the distances (inverse logic here, the small multipliers make distances smaller so that descriptor is likely to trigger nearest neighbours)
one could remove outliers altogether in the sanitisation code (but beware of empty slots and stats in time bundling… maybe a buffer per time slots would be heathier then)
that one smell more of pollen than flowers or fruit…
one could use lpt to generate analysis, and then use the discrete 1D patch to arbitrarily curate a single dimension proximity…
To my ear the one without the centroid bump sounds worlds better, particularly in terms of the spectral response, whereas the one with the hyped centroid ends up having a random filter sound, as opposed to sounding like it is centered around that frequency range.
This is obviously a slightly exaggerated example here, but I guess it shows that with this kind of “artificial value” sanitization, that you end up potentially throwing out resolution around those points.
(I also tried changing the centroid sanitization point to 100 and re-ran the analysis, but I think I did something wrong along the way as my results were exactly the same)
This is a bit of code to help me decide window and hopSize. I reverse the formula from the fluid.pitch~ help (thanks @tremblap for pointing that out).
Why do I need it: cutting my sounds (which sometimes are very short) into four chunks for temporal analysis left me with cases, where the chunk was shorter than the window.
Oh, I forgot to mention earlier. I don’t know if this is a property of coll with so many iterations, but when I was running the LPT patch on a larger corpus (ca. 8000 slices) I would consistently get a stack overflow error:
I remember having an issue with urn many years ago where I was iterating over a choice until I got a satisfactory result, and after a Max update, the same process caused a stack overflow (my theory being that it only allows a certain amount of loops before it considers it to be an overflow and that number was changed).
That being said, I’ve never had issues with coll and 50k+ entries.
now you have dataset~ to replace the colls, or dicts.
I think that this is the right test to do. If you don’t get any difference, it means your other test’s methodology might also at fault. The good news is that you have all the tools to do that comparison!
The utility of being able to “see” the contents is still pretty significant. But I’ll have a look to see what kind of use cases I can replace stuff with.
The second test involved me changing and rerunning the js code, which I’m not sure I did correctly. The first test was literally EQing a file and running it into the same exact process, so the dodgy results of that come from, my theory is, the over representation at the default sanitization points.
I take the Nicol-Loop and run it through four processes.
Default settings - default Nicol-Loop
Default settings - hyped Nicol-Loop
Centroid @ 100 sanitization - default Nicol-Loop
Centroid @ 100 sanitization - hyped Nicol-Loop
The hyped version of the Nicol-Loop is the same as before, with a peak around 69 (mtof’d).
Because my slicer settings are different, the main/core matching doesn’t sound as good to me, but the results between 1 and 2 are the same. For 2 the frequency jumps around a bit more overall.
The results of 3 and 4 are interesting. Not surprisingly, the energy gets pulled up towards the new “default” value of 100 (mtof’d). For my ears 4 doesn’t sound as “wrong” as 2 does, largely because the audio being sent isn’t centered around the sanitization point.
My takeaway from this is that having a central default works well, unless you end up having energy around that center point, in which case you increase your spectral matching error. Having a high default value pulls everything up, but is less prone to mismatching on incoming audio.
I don’t understand the patch structure well enough to be able to easily test this, but presumably with enough slices/segments, all bandit values can just get trashed and none of these issues would arise.
I have no idea how any of this would work if/when you switch to MFCCs. Like, what is a “default” MFCC value?
you would get into other problems. For instance, if only one frame has confidence, vs 0 frame, vs 10 frames… so you would probably need to add a ‘confidence descriptor’ in your statistics to see how well represented/described is that section…
that is what I was exploring. It is on the back burner for now because of all other things, hence passing the buck and sharing work in progress and ideas…
Another thing that strikes me now, thinking and listening further, is that even though it is the bandits that you are sanitizing, the fact that the overall sound world got shifted up when I changed it to 100 leads me to believe that there are a lot of bandits! It could also be the shit overlap between the two sound worlds here (circuit-bent toys and acoustic drums).
Indeed. Having this kind of detail and discussion around the more complex ideas/patches is fantastic.
another idea that is coming as I was saying is the idea of cluster/distance trees per parameter/space (LPT as 3 different trees) and then see commonalities - imagine asking for 10 clusters in each, that gives you 1000 combinations already, but in a single number. That should be fun to see what is bundled together… but with a target out of the source corpus, you might get into a void… anyways, this is so fun!
If possible for Friday, I would certainly love a bit of a conceptual walk through (if no practical examples are available) of how this kind of stuff works “in context”. I find it hard to wrap my head around, in general, but also in specific when it comes to situations where you want to nudge the kind of matching being done (e.g. “find the nearest match but that is short”, “find me an exact match for pitch, and/or return the matched value for compensation”, etc…).
I guess I’m quite used to the entrymatcher paradigm of querying, so it’s hard to conceive how things work in a more ML context, when I’m after similar (but better) results.