Using fluid.noveltyslice~ to find formal changes

After seeing @weefuzzy’s talk at the plenary followed by some interesting discussion in this thread, I wanted to take a stab and using fluid.noveltyslice~ to segment live audio formally, rather than by events/onsets.

After playing with @fftsettings and @kernelsize (and getting some hearty crashes along the way), I found that @fftsettings 8192 128 -1 seemed to work ok, but was still reporting clearer onsets (as opposed to shifts in form) depending on the type of material used.

@threshold seemed to kind of be all-or-nothing as well, with a lot of activity around 0.2, and then >0.4, getting nothing really.

(as an aside the helpfile/refence says that @threshold should be a value between 0-1, but the @attribute is not clamped at all. Is that intentional?)

Latency isn’t super critical with what I have in mind, but keeping a smallish hopsize would be good in that something along the lines of 300-500ms latency would be passable for my use case.

I vaguely remember from @weefuzzy’s talk, that he did a whole load of downsampling (the bane of me at the moment!) and iteration to “roughly” and “vaguely” tell where changes in material happened. Is there a FluCoMa-able approximation of this idea/approach possible?

I’m attaching an example audio file to show the kinds of sound I’m working with, but I (musically) think there are around 3-4 types of material in here that I’d like to be able to differentiate between. I’ve mainly tested with @feature 0, as that seemed to work best in practice, although I thought @feature 1 would be better for this kind of noisy material.

turntable snare extract.mp3.zip (936.0 KB)

things to try to find larger chuncks:

  • larger hop size
  • downsampling
  • autothresh (check the examples for an iterative way of doing that)

now, spectral novelty is very likely, at small resolution, to be attacks, so you need to blur time resolution and swallow them, hence proposal 1 and 2 above. as for thresh, it is context dependant, so autothresh is a good idea to explore

I’ll investigate the crashes of the other thread as we did not get them with our test material. Non-modulatable is easy to implement as it is everywhere else, so it might either be a real bug or an omission.

1 Like

Where is this? I didn’t see anything on that in the helpfile(s) or /examples folder. (or do you mean the “filter size” tab in fluid.bufnoveltyslice~?)

Would that also impact the chunk size, and not just latency? I guess impacts how soon a new chunk can be assessed?

I didn’t even load material. I got those crashes just adjusting the params in the helpfile as-is. With audio off or on.

/examples/segmenting/nb_of_slices.maxpat

yes, so the plan is for you to find with super large windows in large file where large changes happen. then, rerun on that chunk only with whatever other parameter.

for instance:
You could do an ampseg looking for minimum up time of 1sec and min silences of 500ms on a 10 minutes file. you get large chunks
then you take those and do fine tuning.
how sexy and flexible is that?
:wink:

Oh yeah, I remember seeing that.

The problem is…

I want to do it on real-time audio.

Unless you mean using that process to figure out what a suitable thresh might be, but it still doesn’t help with the fft/kernel/filter sizes, which are probably the critical things here.

Oh, I forgot to ask about this in my previous response. Would that then mean I also want to look at different @features as well, particularly if downsampled?

I should have guessed. then you need to spy on your ears and brains: how do you actually segment in real-time? The answer is you don’t. you just remember and make sense of the past. So having some sort of past that is analysed bluntly to set automagically some threshold is something you can now, thanks to flucoma, explore :wink:

segmenting on amplitude change, and on MFCCs, would be the 2 I’d choose, for loudness and spectra, respectively.

Hehe, hence the thread title. :wink:

By this do you mean keeping a running buffer and running ‘offline’ analysis (in lieu of the real-time version), JIT style? Or is that what having larger FFT sizes enables? Or downsampling? Or both?

As in using either, or using two instances, one with each @feature?

I imagine wanting to know about the past, and if I’m different enough to be elsewhere or back. That really depends on your definition of past and section, so you need to experiment. but let say I want to know every 10 seconds where I am, I’d analyse at that frequency and get a reading of the past 30 and see if there in one in the last 10… I have not done anything like this but this is all possible. So I go back to my hint:

either, in relation to the above.

Right, doing a running buffer offline reading and updating based on that (rather than having a massive 30s fft inside the realtime version).

1 Like

or having a massive 30s in RT, but downsampled 32 times (let’s see if you could guess the fft size now :wink:

Is it as simple of figuring out a 30s FFT size and then dividing it by 32?

27%20pm

1 Like

Ok, got something kind of working.

I’m testing it with smaller fft sizes for now, since it’s quicker to hear the changes.

Also, what units is @filtersize in? Samples, bins? More specifically, if I’m working at extreme downsampling, should I adjust my expectations with the @filtersize accordingly?

novelty.zip (3.8 KB)

IIRC: kernel size is the window of time (in hops) that the difference is computed
filtersize is in hop too, and is a smoothing of the novelty curve (because @weefuzzy said famously that it is jittery as ***)

1 Like