Various experiments with the LPT patch

Here is a list of potential ideas on how to play with the LPT example provided and discussed about in yesterday’s presentation. It was not recorded and maybe some of them did not come across clearly, so this is a brainstorm. Feel free to post your additions and/or implementations and/or results in this thread

Caveat: I should have done an abstraction for the analysis process since it happens everywhere exactly the same. I am saying this because if you change parameters, you’ll need to change them in the various places, but to start exploring nearest neighbours you get, that is only in 2 places (the analysis and the top layer of nearest)

Low-hanging fruits (results within seconds)

Time-based and somehow interrelated:

  • changing the fft settings at the moment all descriptors run at 2048 window size and 512 hopsize. changing those could be fun, but they need to be changed also in the stats bundling (which is in hops)

  • stats bundling in the single message above bufstats you have 3 bangs preceded by 3 time windows. They are explained in the comment on its right.

  • how long is analysed at the moment there is a minimum of 200ms duration and an absolute duration of 500ms (zero paded or truncated depending on the source lenght). These are very personal biases you can change easily.

Value-based and again somehow interrelated:

  • the threshold of validity In the javascript code, one could change quickly
    • the threshold of silence,
    • the threshold of pitch confidence
    • the threshold of centroid confidence.
  • the values on invalid entries again in the javascript, one could change the values of each entry when declared invalid.

Mid-hanging fruits (a few hours of fiddling)

  • One could change which stats are being bundled. At the moment they are mean, std of the values, and mean and std of their first derivative (how they change in time), for each 3 sanitised descriptors, for each time slot. That makes 4 stats x 3 descriptors x 3 timeslots = 36 values per point/entry/slice. The first thing to try would be replacing the stats by mean/std, 25%, 75% for instance, or mean, 25, 50, 75. That way you keep the same number of points, dismissing derivative but taking into account 2 statistical measurement. The ‘derivative’ could be considered almost covered by the 3-time bundling…

what is the next fruit ripeness metaphor? Blooming flowers? (a few days of fiddling)

  • one could explore how subsets could be done via datasetquery conditions, before finding the nearest neighbours.
  • one could weigh the various dimensions by multiplying the distances (inverse logic here, the small multipliers make distances smaller so that descriptor is likely to trigger nearest neighbours)
  • one could remove outliers altogether in the sanitisation code (but beware of empty slots and stats in time bundling… maybe a buffer per time slots would be heathier then)

that one smell more of pollen than flowers or fruit…

  • one could use lpt to generate analysis, and then use the discrete 1D patch to arbitrarily curate a single dimension proximity…

Let me know if I forgot anything we talked about!


(reposting the info from the other thread here, as this makes a better home for it)

So after yesterdays FluCoMa chat I decided to test out the data sanitization stuff that @alicee brought up and that I further speculated on.

I created a peak filter hyping the frequencies around the sanitized centroid/point and recorded the before/after results).

Here’s a (shitty phone) video showing the results:

And here’s the audio only (mosaic on left channel, drum loop on right):

To my ear the one without the centroid bump sounds worlds better, particularly in terms of the spectral response, whereas the one with the hyped centroid ends up having a random filter sound, as opposed to sounding like it is centered around that frequency range.

This is obviously a slightly exaggerated example here, but I guess it shows that with this kind of “artificial value” sanitization, that you end up potentially throwing out resolution around those points.

(I also tried changing the centroid sanitization point to 100 and re-ran the analysis, but I think I did something wrong along the way as my results were exactly the same)

This is a bit of code to help me decide window and hopSize. I reverse the formula from the fluid.pitch~ help (thanks @tremblap for pointing that out).
Why do I need it: cutting my sounds (which sometimes are very short) into four chunks for temporal analysis left me with cases, where the chunk was shorter than the window.

This makes sure that I’m getting always at least 8 analysis frames and for longer sounds, hopSize does not go beyond 2048.

window and hopSize are send to all analysis objects


Oh, I forgot to mention earlier. I don’t know if this is a property of coll with so many iterations, but when I was running the LPT patch on a larger corpus (ca. 8000 slices) I would consistently get a stack overflow error:

When I went down to a smaller size it was fine.

I remember having an issue with urn many years ago where I was iterating over a choice until I got a satisfactory result, and after a Max update, the same process caused a stack overflow (my theory being that it only allows a certain amount of loops before it considers it to be an overflow and that number was changed).

That being said, I’ve never had issues with coll and 50k+ entries.

now you have dataset~ to replace the colls, or dicts.

I think that this is the right test to do. If you don’t get any difference, it means your other test’s methodology might also at fault. The good news is that you have all the tools to do that comparison!

This is great, thanks for sharing!

My pleasure. Actually, that bit of patch was the one that allowed me to really understand overlap and edge cases in our (and some of python’s) frame processings…

The utility of being able to “see” the contents is still pretty significant. But I’ll have a look to see what kind of use cases I can replace stuff with.

The second test involved me changing and rerunning the js code, which I’m not sure I did correctly. The first test was literally EQing a file and running it into the same exact process, so the dodgy results of that come from, my theory is, the over representation at the default sanitization points.

you now have 3 ways to look at them: print, dump and save. Check example 7 for a dict integration.

I know. but now you can see if your theory is right. Change the value, reload, run. Solutions.

Ok, turns out I changed only the value when assigning via silence, but not via a spread that is too wide.

I don’t remember the segmentation settings I used yesterday, but I re-ran it to compare with setting the default centroid at 100 when the spread is too wide and here are the results:

I take the Nicol-Loop and run it through four processes.

  1. Default settings - default Nicol-Loop
  2. Default settings - hyped Nicol-Loop
  3. Centroid @ 100 sanitization - default Nicol-Loop
  4. Centroid @ 100 sanitization - hyped Nicol-Loop

The hyped version of the Nicol-Loop is the same as before, with a peak around 69 (mtof’d).

Because my slicer settings are different, the main/core matching doesn’t sound as good to me, but the results between 1 and 2 are the same. For 2 the frequency jumps around a bit more overall.

The results of 3 and 4 are interesting. Not surprisingly, the energy gets pulled up towards the new “default” value of 100 (mtof’d). For my ears 4 doesn’t sound as “wrong” as 2 does, largely because the audio being sent isn’t centered around the sanitization point.

My takeaway from this is that having a central default works well, unless you end up having energy around that center point, in which case you increase your spectral matching error. Having a high default value pulls everything up, but is less prone to mismatching on incoming audio.

I don’t understand the patch structure well enough to be able to easily test this, but presumably with enough slices/segments, all bandit values can just get trashed and none of these issues would arise.

I have no idea how any of this would work if/when you switch to MFCCs. Like, what is a “default” MFCC value?

#4 = double-click!

In a general sense, I’m not sure what I would use fluid.dataset~ for (other than natively in a TB2 context) vs just using dict or coll.

@tremblap, here is one more question about the LPT patch. You mentioned that the 500ms at 44100 kHz lead to 43 analysis frames.

In the bufstats you then look at frames 2-45 - why are you not starting at the beginning and why are you going beyond the 43?

I made a mistake. The overlap of 4 generates (as per the bufpitch help formula)

Number of frames in the features buffer = ((source numFrames + windowsize) / hopSize) - 1

so ((22050 + 2048) / 512) -1 = 46 frames

I sacrifice the first 2 as they are incomplete (zero padded) but that is arbitrary
I sacrifice the last one for the same reason, and again completely on a hunch.

feel free to change all these numbers.

you would get into other problems. For instance, if only one frame has confidence, vs 0 frame, vs 10 frames… so you would probably need to add a ‘confidence descriptor’ in your statistics to see how well represented/described is that section…

that is what I was exploring. It is on the back burner for now because of all other things, hence passing the buck and sharing work in progress and ideas…

Yeah, that occurred to me afterwards as well.

Another thing that strikes me now, thinking and listening further, is that even though it is the bandits that you are sanitizing, the fact that the overall sound world got shifted up when I changed it to 100 leads me to believe that there are a lot of bandits! It could also be the shit overlap between the two sound worlds here (circuit-bent toys and acoustic drums).

Indeed. Having this kind of detail and discussion around the more complex ideas/patches is fantastic.

another idea that is coming as I was saying is the idea of cluster/distance trees per parameter/space (LPT as 3 different trees) and then see commonalities - imagine asking for 10 clusters in each, that gives you 1000 combinations already, but in a single number. That should be fun to see what is bundled together… but with a target out of the source corpus, you might get into a void… anyways, this is so fun!

Indeed curious on that stuff.

If possible for Friday, I would certainly love a bit of a conceptual walk through (if no practical examples are available) of how this kind of stuff works “in context”. I find it hard to wrap my head around, in general, but also in specific when it comes to situations where you want to nudge the kind of matching being done (e.g. “find the nearest match but that is short”, “find me an exact match for pitch, and/or return the matched value for compensation”, etc…).

I guess I’m quite used to the entrymatcher paradigm of querying, so it’s hard to conceive how things work in a more ML context, when I’m after similar (but better) results.

There is nothing on this Friday’s agenda yet, so let’s see how it populates… this is definitely interesting to me too!

