Various experiments with the LPT patch

Dear all

Here is a list of potential ideas on how to play with the LPT example provided and discussed about in yesterday’s presentation. It was not recorded and maybe some of them did not come across clearly, so this is a brainstorm. Feel free to post your additions and/or implementations and/or results in this thread

Caveat: I should have done an abstraction for the analysis process since it happens everywhere exactly the same. I am saying this because if you change parameters, you’ll need to change them in the various places, but to start exploring nearest neighbours you get, that is only in 2 places (the analysis and the top layer of nearest)

Low-hanging fruits (results within seconds)

Time-based and somehow interrelated:

  • changing the fft settings at the moment all descriptors run at 2048 window size and 512 hopsize. changing those could be fun, but they need to be changed also in the stats bundling (which is in hops)

  • stats bundling in the single message above bufstats you have 3 bangs preceded by 3 time windows. They are explained in the comment on its right.

  • how long is analysed at the moment there is a minimum of 200ms duration and an absolute duration of 500ms (zero paded or truncated depending on the source lenght). These are very personal biases you can change easily.

Value-based and again somehow interrelated:

  • the threshold of validity In the javascript code, one could change quickly
    • the threshold of silence,
    • the threshold of pitch confidence
    • the threshold of centroid confidence.
  • the values on invalid entries again in the javascript, one could change the values of each entry when declared invalid.

Mid-hanging fruits (a few hours of fiddling)

  • One could change which stats are being bundled. At the moment they are mean, std of the values, and mean and std of their first derivative (how they change in time), for each 3 sanitised descriptors, for each time slot. That makes 4 stats x 3 descriptors x 3 timeslots = 36 values per point/entry/slice. The first thing to try would be replacing the stats by mean/std, 25%, 75% for instance, or mean, 25, 50, 75. That way you keep the same number of points, dismissing derivative but taking into account 2 statistical measurement. The ‘derivative’ could be considered almost covered by the 3-time bundling…

what is the next fruit ripeness metaphor? Blooming flowers? (a few days of fiddling)

  • one could explore how subsets could be done via datasetquery conditions, before finding the nearest neighbours.
  • one could weigh the various dimensions by multiplying the distances (inverse logic here, the small multipliers make distances smaller so that descriptor is likely to trigger nearest neighbours)
  • one could remove outliers altogether in the sanitisation code (but beware of empty slots and stats in time bundling… maybe a buffer per time slots would be heathier then)

that one smell more of pollen than flowers or fruit…

  • one could use lpt to generate analysis, and then use the discrete 1D patch to arbitrarily curate a single dimension proximity…

Let me know if I forgot anything we talked about!

p

(reposting the info from the other thread here, as this makes a better home for it)

So after yesterdays FluCoMa chat I decided to test out the data sanitization stuff that @alicee brought up and that I further speculated on.

I created a peak filter hyping the frequencies around the sanitized centroid/point and recorded the before/after results).

Here’s a (shitty phone) video showing the results:

And here’s the audio only (mosaic on left channel, drum loop on right):
http://www.rodrigoconstanzo.com/bucket/NicolComparison.zip

To my ear the one without the centroid bump sounds worlds better, particularly in terms of the spectral response, whereas the one with the hyped centroid ends up having a random filter sound, as opposed to sounding like it is centered around that frequency range.

This is obviously a slightly exaggerated example here, but I guess it shows that with this kind of “artificial value” sanitization, that you end up potentially throwing out resolution around those points.

(I also tried changing the centroid sanitization point to 100 and re-ran the analysis, but I think I did something wrong along the way as my results were exactly the same)

This is a bit of code to help me decide window and hopSize. I reverse the formula from the fluid.pitch~ help (thanks @tremblap for pointing that out).
Why do I need it: cutting my sounds (which sometimes are very short) into four chunks for temporal analysis left me with cases, where the chunk was shorter than the window.


----------begin_max5_patcher----------
1956.3ocyZszbharEdM9WQWZSlYBgoe+HUkEYSVlMY4TobICBak.RTRhYFmT
49a+d5tk.jUCHjkwP4RHZ0ce5uy6yQ9euaRzC4eOoLB8ynuflL4euaxD2P1A
lT+6IQqi+97UwktoEMOe85jrpno9mUk78J23eKMaQ92Pko+SBJd02hetDsHe
6CqRP4KQOkuw9flEsJMKYd91L2Jo0CltvsO4O7W+DQ1LyrsqSyVkT4nModvM
wUyeJM6w6KRlW4O6BAYFdJRHE1uHBr8KFLF5O2uS4aqZ1Jrcv+6t6rWl1SjC
6vCIEQANtTUztiVQ75jpjh6SxhAv6n0EhEIWLSLEw0hYLiTqLbi8ilS..5.F
83.qga5Gp54MI9MMJZJJ5g3rGiP+43hc4Xh8Z43sE1S9FfzNJ7eBQmEjgvCp
7ROOn4A.M0X0ENAnIGCzKWkGWMpHtD8T08uvTtExIQCSUW.3qCxUmSbiGYj4
8fcTvwFpOoPfSadC.2wbMuLu.sJO6wjBTI30cQ4Tv0b1OTgdLG8Pxy4YKrNn
+CqmaJlq2YOWjTBaWbUZd1AvhPoTGt3b2WXoCJ3CfhkxyyWkW3WA1Ji2e4PP
2hCian7x7rJmbvxfoyHL7AeHANdGJI5utm2pCqCX0onc.0am7Yc7eCwLqPqR
hKqPZTbV7pmKSKQKsNTK6o7fKMyXTt1vEDtRB3QOlRGh48P5vX3tRGhReEkN
MlF+h03oXdBBH0u4DLnOi9vu6BLZywwKrPoYnpmRPKShq1B7.zCaWtDlvOgH
e7rBRhxaXomIbLSPLRLXAVMpBR8UUPR8ojoTyT.9TbrS4ToMfFK4sPNdDW7v
oMc810s7v0hsHFZX6PNPzjglqRpUEz+0fhcCg8AJsWDWrN1cdkcwrg+Fj5FS
p6JoU7apT29LREJ0MCYfp.A8R8tj41EH8039H8a4ifM6jJE14lA6jiD+ZQZ7
pnyx5nl.VOReperaD0khSm2KGGTsAebryDhYbhjKIdPqrflAnV2gSnOigyQU
hFeNvIyOlStXlfAOy7Rl.0XB3LUcSvDpPPP9.P+xK6gJ7o+Kn91TL3fEdOES
eUtLNVdOwKh2TgpaqSruFAW2cVjrIIaAfHTdlufBzhsEtDDrY+jWr.x2oJG8
2IIaNLs1Z8GHinp80ej+UXx1Xx1sG0oRkfcLhEH7s5RkAbhSFP7Y5yXtewYW
oNFIFyHu0pS1B.vZvjgoUt7G4zatll.oAGroIrKMxaClYrtNMuk5YxmcJ2ga
TzPwLNPfBNcn9HuNYaPHuqYaTy5BGi8cJaii43sukT9KnO7gNkj9in8go+HT
fZc0qezV.54p+TOSyfTYU6KADQgXu1A47lwHiXsnby6Qonbbqxq4BGjnWuRQ
+Q3HXjA8IfGnOAFiFvmf7cvmvQ6vZVw84Kuuo2V22t2VGxCFX1TBrZFQY4A9
neXsUwipulMZsEl5ZRfIFIYZv6NhEBM5s1yHCb7oDFHOHl6CUvORZ3B00qg7
eBXMfge.0CVPNBILGgNHNBTMuC0drazFCHeLptbDt51NnK+VHlaPcoarJ7+T
3zREu45ZM7HpqAYl.JdWYsKn3rJH5ZWdg7MnhERchD7lqWJPGthvEXEodKPt
ury2qpytj1DNlnOrq02WdwQbIjlsL++gdpx0GhPMKdfooP7uxaBg64Dr5vtL
4IAu4rs.ZUZ4AsBpm2D7UM3NBtVt7h+yfbnxNda1ouHj5ySSZrn8HYQRYUZ1
tz1+x9JBsSJnHq2ThzGJQGEJI5Eo3slkqYXNEiqBwUsl05zEaxAoasTTHc0h
wYAxjQJ7kF08YcgB90AEZePBaLjXuPVbLllbTnkrWzZbvkpWzZTruX8gThwf
R8RCeDHDsOtL3iEgHmgPZ7HPo9HjD9CyqhN8xdZL.DG2W25ji4siI4ce8RLC
eFLD3eSwzXi.9yVfBw5WLvydsnfzW90QQgsWuceSgTUcydZgBWdEAd1qDE8w
tbT70zGOnlQfNpdZs7ZoitGzYLhGn6iwhYLjPldPH0HQG44bbNFzoOFnlwHV
foWtNCDH0mKd7lMeMonrd5Np.Ew7W9VKpm59YZl+mtRPhJR9ZZy7cuuzn3Bn
hjJnbjsE9dS7cou+PQqygL6x1lVmmJfOfjkUOu5k4+uqsFOEmUhw0MmDPyx3
sqpZyBhmOOIqc+OAmWDkz10SlPpkL6cXtBKLs6+4gca4Kt9sz80LbeZlkijz
r23oGb4vU7visNCRImK414IgDcI9SiVSTrWtrkoqVsaoS1AL3LTW7UziEwKR
22G3IQsnD0Xasqa+wFMl6uqS2dqWEoYYZIUwcu2DMvYzZ+cZNK3xn6vE3o2M
Wo8ixememZsJnLYeo6T2+e00itoHeSdQi5n8elocyeaU9Nb1j7uWE4EHlTy4
2eoQQdZsoRArGN0JG2qV4YMn4j1ryGZZsSYCpVWZKg+xT17LNlg5kwpZYvHp
L8B.CEWMh.10RgSA4VfkSYBpwogonBt6NqvmL1xg0IkkwOlbpC1P3pC0V6.q
lFs9t2cBqFkVCbtoAuq6xtXK6wxX6Thu1VG8SJ5ahTu0tbxLNnVgot6.LyLC
VyxGQy0+M6gpbSrORoqMc28e28+wi4J4o
-----------end_max5_patcher-----------
1 Like

This makes sure that I’m getting always at least 8 analysis frames and for longer sounds, hopSize does not go beyond 2048.

window and hopSize are send to all analysis objects


----------begin_max5_patcher----------
879.3ocyWErbaBCD8L9qPCSORYjDRBodJ+Gc5jAiENpECLfnIscZ+16JIvw3
PRcR8gvAlkkUZ22SuUB90ln3ssOnGhQeB8YTTzu1DE4c4bDM8bT7ghGJqKF7
gEePOLTrWGmDdmU+f06upxNnsVSy9AzGHnOPQejLGjYmOj1se8ir7YmMiGLM
0ZqeZoSN6Jrk2Ayws85RanrHbYJiHXBBS4txYjDDQvSw9KVtfnvLN3ToRwIH
JIEi9xiIoczNmExj2fK6O5zgTDGi9h6M+dyF2sjKjJZz2CP5ILQWQ42PFjYM
zytNnOilxOG8DJNk+N.98n6r2dWa2f4m5UnfL9pT.94o.JWkpdBEfISTfhSy
jXEFbJjuKD.dF3dSyt16eNRH+0RBqqCViDj4uKHgp5QytzsiUGpJK+C5lg1w
9RMvLfQyNzMU5B6XudvwU1Cc25Ba.cCTpks5ppADECAcxlJLrR.NYRu0JjJe
cRk77jpR3opLQJCykTNiJxHYBVBhS3+CVL64XwsEM6iSbDPag0YrjV8SWbso
47sc80my+RtNvaSy87NnnGqpc5AfeJrl1lSBhOEzpqmWblXWPlXWiLkcIXZt
b9+xD+RyDYkkshttuq6Glh1mDnY3qs8tGkI9GMMgG85t3d82Myw6kLwE8fLz
BZPP860tOHBmJDencmtuYz3G8FG7fTNX+Q84RklhCggdWQy.FSC5d.LUEi01
kLPQYotwV1VGppOivovdGjbAPCoYbgTj4rfSRvbEf5SD7UsMV+FXgcfN8Ut1
b+bdqowwH544FmbxsSGw18KpAgfwbMZfUlRPBUiTRxyNeXUl55iCM5HvfZXp
mKdeewNC.xo1+n3EYhB6JpByOVIwrfE3hrHSSihLOLofly3tfk.yHkAKIKa0
gQOhKNjDOrbW4AqvLsXTvlD09pml6npIuc8scs8ypQnNUGiez1dDmt2486kH
mgXxDy+3sYgbxTmROLGdYkm8lDOG.kiYdlOsy5nXC10W3NJ30I1BDWlhFViy
mVCthhoy.L7UPWQ.6OG4kf7BvxnYbpxqvxgiS7VtEex0dcXwWgudg8VX02Zu
1IcMyp9mZ8BcM4RIvbIqZ8zg8p6ruVMauzx2xtiKaUL7sCWr5xulw.YEl5s.
LmodyJqvIZ9uiyUTCvusDB2ghM+dyegMBYCu
-----------end_max5_patcher-----------

Oh, I forgot to mention earlier. I don’t know if this is a property of coll with so many iterations, but when I was running the LPT patch on a larger corpus (ca. 8000 slices) I would consistently get a stack overflow error:

When I went down to a smaller size it was fine.

I remember having an issue with urn many years ago where I was iterating over a choice until I got a satisfactory result, and after a Max update, the same process caused a stack overflow (my theory being that it only allows a certain amount of loops before it considers it to be an overflow and that number was changed).

That being said, I’ve never had issues with coll and 50k+ entries.

now you have dataset~ to replace the colls, or dicts.

I think that this is the right test to do. If you don’t get any difference, it means your other test’s methodology might also at fault. The good news is that you have all the tools to do that comparison!

This is great, thanks for sharing!

My pleasure. Actually, that bit of patch was the one that allowed me to really understand overlap and edge cases in our (and some of python’s) frame processings…

The utility of being able to “see” the contents is still pretty significant. But I’ll have a look to see what kind of use cases I can replace stuff with.

The second test involved me changing and rerunning the js code, which I’m not sure I did correctly. The first test was literally EQing a file and running it into the same exact process, so the dodgy results of that come from, my theory is, the over representation at the default sanitization points.

you now have 3 ways to look at them: print, dump and save. Check example 7 for a dict integration.

I know. but now you can see if your theory is right. Change the value, reload, run. Solutions.

Ok, turns out I changed only the value when assigning via silence, but not via a spread that is too wide.

I don’t remember the segmentation settings I used yesterday, but I re-ran it to compare with setting the default centroid at 100 when the spread is too wide and here are the results:

http://www.rodrigoconstanzo.com/bucket/Nicol_Comparisons_All.zip

I take the Nicol-Loop and run it through four processes.

  1. Default settings - default Nicol-Loop
  2. Default settings - hyped Nicol-Loop
  3. Centroid @ 100 sanitization - default Nicol-Loop
  4. Centroid @ 100 sanitization - hyped Nicol-Loop

The hyped version of the Nicol-Loop is the same as before, with a peak around 69 (mtof’d).

Because my slicer settings are different, the main/core matching doesn’t sound as good to me, but the results between 1 and 2 are the same. For 2 the frequency jumps around a bit more overall.

The results of 3 and 4 are interesting. Not surprisingly, the energy gets pulled up towards the new “default” value of 100 (mtof’d). For my ears 4 doesn’t sound as “wrong” as 2 does, largely because the audio being sent isn’t centered around the sanitization point.

My takeaway from this is that having a central default works well, unless you end up having energy around that center point, in which case you increase your spectral matching error. Having a high default value pulls everything up, but is less prone to mismatching on incoming audio.

I don’t understand the patch structure well enough to be able to easily test this, but presumably with enough slices/segments, all bandit values can just get trashed and none of these issues would arise.

I have no idea how any of this would work if/when you switch to MFCCs. Like, what is a “default” MFCC value?

#4 = double-click!

In a general sense, I’m not sure what I would use fluid.dataset~ for (other than natively in a TB2 context) vs just using dict or coll.

@tremblap, here is one more question about the LPT patch. You mentioned that the 500ms at 44100 kHz lead to 43 analysis frames.

In the bufstats you then look at frames 2-45 - why are you not starting at the beginning and why are you going beyond the 43?

I made a mistake. The overlap of 4 generates (as per the bufpitch help formula)

Number of frames in the features buffer = ((source numFrames + windowsize) / hopSize) - 1

so ((22050 + 2048) / 512) -1 = 46 frames

I sacrifice the first 2 as they are incomplete (zero padded) but that is arbitrary
I sacrifice the last one for the same reason, and again completely on a hunch.

feel free to change all these numbers.

you would get into other problems. For instance, if only one frame has confidence, vs 0 frame, vs 10 frames… so you would probably need to add a ‘confidence descriptor’ in your statistics to see how well represented/described is that section…

that is what I was exploring. It is on the back burner for now because of all other things, hence passing the buck and sharing work in progress and ideas…

1 Like

Yeah, that occurred to me afterwards as well.

Another thing that strikes me now, thinking and listening further, is that even though it is the bandits that you are sanitizing, the fact that the overall sound world got shifted up when I changed it to 100 leads me to believe that there are a lot of bandits! It could also be the shit overlap between the two sound worlds here (circuit-bent toys and acoustic drums).

Indeed. Having this kind of detail and discussion around the more complex ideas/patches is fantastic.

1 Like

another idea that is coming as I was saying is the idea of cluster/distance trees per parameter/space (LPT as 3 different trees) and then see commonalities - imagine asking for 10 clusters in each, that gives you 1000 combinations already, but in a single number. That should be fun to see what is bundled together… but with a target out of the source corpus, you might get into a void… anyways, this is so fun!

1 Like

Indeed curious on that stuff.

If possible for Friday, I would certainly love a bit of a conceptual walk through (if no practical examples are available) of how this kind of stuff works “in context”. I find it hard to wrap my head around, in general, but also in specific when it comes to situations where you want to nudge the kind of matching being done (e.g. “find the nearest match but that is short”, “find me an exact match for pitch, and/or return the matched value for compensation”, etc…).

I guess I’m quite used to the entrymatcher paradigm of querying, so it’s hard to conceive how things work in a more ML context, when I’m after similar (but better) results.

There is nothing on this Friday’s agenda yet, so let’s see how it populates… this is definitely interesting to me too!

1 Like