Compare live audio input to corpus / pure data

yogi · October 15, 2024, 7:29am

hello,

I began using Flucoma in a project a few months ago, first replacing what I was doing manually before discovering this library ( slicing & classification by hand ) my next step is to do some instrument design, where I compare arbitrary live input ( contact mic with prepared guitar, with different types of excitation, voice through a sm57 )

I read a lot on this forum, but as a linux user I don’t have the possibility to read max patches on my computer in order to fully understand all the informations, I’m left with many questions.

One is how do one prepare / calibrate an arbitrary live input before trying to match it with
different corpuses, so more of the live signal can match efficiently.

@tremblap @rodrigo.constanzo searching the forum I found out you are both quite experts at this, if I could learn from your experience that would be great.

I attached an example patch for slicing & matching corpus aimed at eventual pd users that would stumble upon this post.

corpusMatcher .zip (4.2 KB)

rodrigo.constanzo · October 15, 2024, 10:13am

Howdie.

I guess it depends on what you are trying to do. Are you trying to analyze and trigger samples based on onsets (e.g. an “event” is detected, then you find the nearest match in a corpus) or trying to do continuous matching ala concatenation?

In either case, I’m not a pd expert by any means, but I did port a bunch of the SP-Tools abstractions over to pd:

You can find the pd abstractions at the bottom here:

These are pretty well optimized (their Max counterparts at least) to do robust analysis with low latency, and output list and buffers depending on what you need.

There’s also examples of classification in there as well.

What the SP-Tools examples don’t do is play any audio (something I’m not knowledgable in in pd since it’s so different from Max).

I could have sworn there were more native pd FluCoMa examples, but it looks like the /examples folder only has a couple.

There are, however, some pd FluCoMa video tutorials as of a few months ago:

(not exactly the same subject, but there’s a couple of these pd vids which may cover some things)

Let me know if that helps.

yogi · October 15, 2024, 10:57am

Hello Rodrigo,

thanks for the reply, I went through most of Flucoma’s documentation, half of the conferences & podcasts , all tutorials , and also through all your videos about Sp-tools.
I also opened your sp tools for pure data as they contained nice information and ways of doing things.

What could be useful to me right now is more precise , it would be some explanations about your ways of matching “efficiently” a corpus and live audio input that have not much in common, in a video about sptools 0.2 you call this " scaling", when sound analysis are not much “overlapping”

Cheers in advance

rodrigo.constanzo · October 15, 2024, 11:28am

Gotcha.

Once you’re in that territory things get a little fuzzier and material-dependent.

Scaling is one way to do this, but doesn’t always work well and can be problematic for some parameters (e.g. pitch).

The workflow I do in Max to set this up is analyze a corpus, just as you have (more-or-less), then do a similar process for what I expect to be my input. I just went looking as I drew myself a diagram of all the data steps as it got a bit tricky to follow, but it was basically robust scaling everything, then doing the matching on that (and in my case I throw a bit of regression/prediction to match longer timeframes than is possible in realtime).

Then obviously the descriptor choice matters a lot. I see you’re using melbands as your main descriptor which works ok. If timbre is what you’re after, I think MFCCs are better all around.

Lastly, I would suggest trying fluid.bufampslice~ over onsetslice as it is much more temporally accurate, so if you have sounds with sharp transients and/or short analysis windows (as in SP-Tools), having the kind of temporal slop you get with onsetslice can throw things off.

yogi · October 15, 2024, 11:59am

Cheers, I guess the key object here is “robust scale” , as an example, if I’m to use my voice (live audio input ) to control a glitch noises corpus, I do a robust scale of my analysis, then compare it to the descriptors I get from my voice ? I understand that the corpus will be scaled, what about the live input (voice) how would you go about informing the patch about what kind of sounds I will make so I can scale it in advance ?

( the regression/prediction for realtime mathcing of longer frames is fascinating, but my brain is already a bit too hot to process this! )

fluid.bufampslice~ seems great, I haven’t used it yet ( but I read its documentation) , I remember reading in a post that using “fluid.bufampslice~” in conjunction with “novelty slice” could help having good temporality plus benefits on the noveltyslice method.
This is also something I want to try, this time I just have difficulties with the way I would patch it …

rodrigo.constanzo · October 15, 2024, 12:25pm

It does get a bit funky.

Firstly, I mean “robust scale” as in fluid.robustscale~ (another flavor of standardization) as opposed to meaning scaling, that is robust.

The approach I use is to give the computer some idea of the world in which I will be operating in. I call this a “setup” in the context of SP-Tools, but that’s not really a standard term. So with my approach you give it a couple minutes of examples of the kind of thing you may do, which is then used as the to create some robust scaling for what the input would be, exactly the same way you are doing for your corpus. In my case I save this as a separate .json file that I load later along with the corpus stuff.

Then when you get new input, you run it through robust scaling to turn it into the same kind of standardized space that your corpus is analyzed in. Ultimately you compare two scaled spaces to each other.

So in the end you have:
corpus → robustscale → robustscaled corpus
input → robustscale → robustscaled input “fit”

input → robustscale’d input → kdtree’d robustscaled corpus → nearest match

I did experiment with this quite a bit, and in the end just went with fluid.bufampslice~ since it gave me the best results overall. The basic idea was to run fluid.bufonsetslice~ on everything, then going back and running fluid.bufampslice~ over the “hop” in which the onset was found to narrow it down further. If fluid.bufampslice~ didn’t find a more refined one, then default to the fluid.bufonsetslice~ version.

In principal this works, but I feel like there’s a lot that fluid.bufonsetslice~ doesn’t catch at all, and the tradeoff for “non amplitude-based transients” wasn’t worth it.

yogi · October 15, 2024, 2:33pm

Thank you for this detailed explanation !