The structure and intended use cases for the corpus analysis abstractions

So I want to (quickly) test a bunch of different descriptors and statistics to see what is working best for a given dataset, and wanted to leverage the oldschool fluid.processsegments and associated GUIs.

In looking at the patches, specifically -comparison, which @tremblap used in this week’s geekout, it struck me that this seems like a quite narrow/specific use case. Unless I’m missing something, there’s not a way to run a folder of individual (pre cut/sized) samples, without resegmenting them. Or rather, all of my files have a 20ms bit of padding at the start, so they can run through fluid.bufampslice~ to accurately recreate “realtime” onset detection on the same sounds. So in this case I can’t bypass the fluid.segmentcorpus abstraction entirely because the starts/ends of my files in the “big buffer” aren’t actually the start/ends of what I want to analyze.

Modifying the concatenation subpatch wouldn’t work here either (or without much grief) as it’s accum based and wants consecutive numbers rolling out.

So the first question is about that. Is there a reason why individual files are concatenated before analysis? If the files are already sliced/separated, it seems that it would be best to keep them that way, which also simplifies playback layer on with polybuffer~ etc…

This then also has knockon effects further down the abstraction chain as I can’t just generated my own dict with the start/end points I want because everything expects all the audio to come from a single giagantic buffer.

The next question is about this section here:
Screenshot 2021-03-19 at 13.05.11

I remember having a…heated…discussion with @tremblap about this at the first or second plenary, but in revisiting this now, I actually have no idea what these numbers correspond to. I’m asked to set start and num windows in frames, but get feedback in ms, but those time frames also go negative (which isn’t “real” because there’s no negative time in any of the segments, or windowing that starts before the segment). I also have no idea what settings here correspond with settings that you can then use in any of the objects, or vice versa. In most of my analysis I’m using @fftsettings 256 64 256 so I end up with 7 frames of analysis, and for these purposes, I want all of them. I don’t know what I’m supposed to put in these boxes to get the same results.

So yeah, kind of a “bump” on these old patches and abstractions as they would come in handy right about now if I want to test a bunch of permutations of stuff, but the concatenation-based workflow here doesn’t work well for a bunch of individual files.

it does - just bypass the middle step (segmenter) by passing it the dict you get from the loader directly to the analysis. the data structure is the same. I did that in our geekery right in front of you, 1 cable to move.

What I’m saying is that doesn’t work. I don’t want to analyze from the start to the end of the file. I have 20ms of “pre-roll” in all the files to run them through onset detection so it’s the same windowing that would happen in real-time. And I can’t just stick that segmentation into the middle step cuz I don’t want to analyze a whole bunch of nearly silent 20ms fragments.

edit:
For a second I thought I could do a workaround as I essentially load everything into the same buffer, analyze, then load again, analyze etc… so I could potentially fill the dict with entries that all go from the start/end points I want (e.g. 880 1500, 882 1502, etc…), but it looks like the iteration happens internally in the abstraction, rather than outside of the object, so I could handle the looping myself.

It handles the iteration because that’s pretty much all it exists to do! Maybe I’ve missed what you’re not able to do, but if you just want to not analyse the first 20ms of any given segment, can you not just use a startFrame in your analysis sub patch?

Well, it’s not always 20ms. There’s 20ms of pre-roll, but the onset detection algorithm always catches that +/- a few samples, so in (a version of) my folder iterating patch, I go through fluid.bufamplslice~ first, and take the first slice output of that as my startframe. This works ok in my own subpatches, but it’d be good to utilize the p featureExtractor subpatch, to not have to faff around with buffers etc…

I could be not following what’s happening when the dict(s) get passed around, but it looks like the entire contents of the dict are passed into fluid.processsegments and then the iteration happens internally. If I follow the feedback loop of bangs, it looks like it’s copying bits from “the big buffer”, so not sure how that would work if I change the contents of the buffer for every loop.

Yes , that’s fundamentally what it’s for

No. But it does assume there’s a big buffer. You could change that quite easily by passing a separate source with each incoming dict entry, and changing the patch to look at that entry rather than the pattr that it currently uses.

I’ll have a look and see what I can pull apart. It’s just a funky looking mix of Owen (dict, pattr) and PA (pack f f f f f i i i s s s s s s i i i f f f f) code all mushed together…

What makes it especially confusing is that my source buffer actually stays the same as well. I just replace the contents and bang for each iteration.

Ok, a workaround is just to batch process the onset stuff ahead of time to convert all the files. Still not sure the purpose of making a big buffer out of little buffers to then analyze little buffer-sized pieces here, but hopefully this will work for now.

Aaand there appears to be some kind of bug in the abstractions somewhere with MFCCs. I noticed I was getting tons of zeros in the output, which I thought was something to do with my files/settings, but testing the default version in the examples folder with @tremblap’s synth test corpus still gives me zeros.

If i use these settings:

I get a 76D dataset, which is the correct amount of dimensions, but for whatever reason it is 27D of numbers, and the rest are zeros:

{
  "cols": 76,
  "data": {
    "Macintosh HD:/Users/rodrigo/Sync/Projects/FluCoMa/PA segments/2-voices-bottom-a001.wav": [
      -1.1864736080169678,
      15.482986450195313,
      -43.20357894897461,
      41.022377014160156,
      8.5552339553833,
      9.647567749023438,
      -11.864818572998047,
      34.05266189575195,
      8.617773056030273,
      4.922311782836914,
      -5.95138692855835,
      25.397205352783203,
      7.330019474029541,
      6.532320976257324,
      -1.6914989948272705,
      20.652301788330078,
      9.974169731140137,
      5.500768184661865,
      -1.0469393730163574,
      27.767465591430664,
      9.182177543640137,
      3.990687847137451,
      -0.5536425113677979,
      18.92623519897461,
      8.185527801513672,
      4.1357951164245605,
      0.2896668016910553,
      17.005983352661133,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0
    ],

I tried less MFCCs too, but that still always gives me a bunch of zeros in the dataset, though the overall amount of dimensions are correct.

If I run other descriptors/stats (loudness/pitch/etc…) I get valid output, so it seems that something weird is going on with the MFCCs in the default patch.

It’s not a bug, it is a ‘feature’ of the output being driven by the maxNumCoeff. Eventually, we might be able to fix this on Max…

1 Like

Ah right. I had modified the abstraction to be able to limit the min/max freq range, but didn’t think to check that.

Once things settle these abstractions could definitely use a nice revamp and love (loudness/confidence weighting, etc…) as they can make building stuff much easier.

I’ll put a pre-request in to avoid concatenation if files are already segmented, but the dict-based workflow is quite clean and elegant.

I guess I misunderstood. Is there some other hard baked value elsewhere? I manually edited the underlying mfcc-gui.maxpat to have a higher maxnumcoeff but I still get the zeros:

As in, the fluid.bufmfcc~ object is now this:
fluid.bufmfcc~ 40 @features #0tmp.mfccs @blocking 0 @numchans 1 @minfreq 200 @maxfreq 10000

Giving this a bump after the chat with @tremblap at the last geekout. If I run this helper thing with the settings shown there, I still get loads of zeros (as you can see in the console).

I’m happy to report that it works as expected. So something is broken in your mod I think… but please check the patch here and tell me if it doesn’t behave as you think it should, and/or what you do not understand:


----------begin_max5_patcher----------
901.3ocwX00bhBCE8Y7WQFd10gvWJ6Sc+crSGm.DbSWHv.Aqa6r9aeStAnZK
pnM18AQHbI2bN4jC2vqyrriK2QarQeG8Sjk0qyrrflTMX0cskcAYWRNoABy9
IlXQBMOONuL421y0QTW9Lb2UNcMjUxEbRAEdjeTyH41GbmLRBbm9nYoPbkwO
8MW29.4sELdNU.cr6aMV1J5a0+f9rg8BzmX2E8caRY9QObEQj7KFey5ZZhPi
4HriLbTj5gPX+H0eK8j+gdr6gzoS7mJp9IryYMB64H6genGUQ92YyTGlOQZj
SeVB2drJn6fAjZHmiDTUF9H0fGkZvmkZFAytQ.L8WAPOvYQvbjq64f76faWX
Mjszz0xQlrmWSDhZVbqPKlrFPskMsHlldvrsbLIGKrFVIW0ZHzHvdFhCUSPY
sbNMeLRz4JHQ7YHwUA.I54CjXn534IQP2X.sxVR8vJqr7VV5h31LWsp7cLww
2F8f7rLZMJKmLl9BuzrTiCnuvdKuL2Xz0PZTtGfI5gFRQUCB6LFfCuB.eFSj
kgZoPjFudWFuY4kDvDIlv2XTvOLkqPufx2idfriII.IST1VmPQxASUqjXRkt
LLNQHWGdREQvUPPdWTQ3E.JBO7pKyP.wL+.l51XIoijP5yLBzFLSqH0xkSBZ
8ZJmDmezKlLxBA2PMd86OdATeWjCEYII6Gz.LNHAxnDQaMsoSSLBMEYVA.FC
bPPzWl.3BdD8KFNiKwJyXRneaauondkvc2jnf1zP1P+.7wifykWQgWmSziC0
0RAkT4FMgpKtInQegjlreDb3al4Ku.XUam2dP+wSCiF1FtpL2gytG5U8B2Q0
odlA2tg9ZE5p2JL9+0Kyjipx5zNTOVwbiBYuqUx18JabjdFeBlS2gYXdIqgN
ldFalpx5AIVW7u+8Ejmx3olpJ.8TavwyLVPK86bWCmZs42FHU64ok0OlUWom
IaKRJoYYMiAw.yLa5ocY6JsXRuN4XLBwI2SB+869GRlp8iAttvgttpGIn2R2
AUSdPPQ5XFkambh7lPhTe1fOelblRlvFHSpsZc4L4XhLMkoo9gymJSQSIQAl
.R3uHk2jljLAfto7nWASpp1Rq6+XJPJjlTOUBVQqlCWx35KA+DoE71gO9BF1
niMoV5uHjlKxMC.VU6B0UQYWTlRq4RSNUvyTnSlSvAT8MHZp59PdfQ4r+N6e
rvkwVB
-----------end_max5_patcher-----------

you might need to replace fluid.bufflatten~ by fluid.buf.flatten - same interface, new c++ compile coming.

edit actually you had that one, it is bufselect~ and a new friend that are coming anew and shiny :wink: I know because I tested the code above against the compiled version you have (I have 5 parallel versions on my mac for bespoke support between versions, I hope you appreciate the love and care )

Oh, the individual object works fine. It’s the help/abstraction stuff that is returning zeros. It’s a bit of untraceable patch to troubleshoot as it all packs together in an very PA-esque:

pack s i i i s i f f s i i s i i i i i i i s i i i i i i i i i i 1 0 2 0 3 0

(that is literally copy/paste from the code)

I can manually get bits like in your example, but to change what descriptors/stats/derivs/settings you want requires a whole bunch of coding for each permutation if you’re putting together an aggregate dataset.

Hopefully c++ choosing of selecting will help, though I still would infinitely prefer to not get things I don’t want in the first place, so you can then just go fluid.descriptors~fluid.stats~fluid.flatten~ and be done.

thats hilarious

none of these needs changing if you change the number of - it is just a way to store the preset of the analysis in the dataset. I could have made a dict. maybe when I review them.

I am a very funny person. Even in my preset coding.

Yeah I know, but when changing fluid.bufmfcc~ (manually) to something that would allow for 20MFCCs, and then using the abstraction layer above it, I get zeros returned as if it’s only passing 13 (or twelve). The pack thing is just to show that it’s kind of impossible to follow the messages that are being generated to troubleshoot where things are failing.

To avoid confusion and crosstalking, this screenshot here shows what I mean. You can see the attributes as they are set, and the settings I put in the helper patch at the bottom, and what that returns in the console (there’s more zeros than you see, but you can obviously see the last ones from the print command.

I initially thought this was because:

…which led me to manually edit/save the abstraction so it has a higher number of maxnumcoeff (which you can see in the attributes), but that doesn’t solve the problem.