After seeing @weefuzzy’s talk at the plenary followed by some interesting discussion in this thread, I wanted to take a stab and using fluid.noveltyslice~ to segment live audio formally, rather than by events/onsets.
After playing with @fftsettings and @kernelsize (and getting some hearty crashes along the way), I found that @fftsettings 8192 128 -1 seemed to work ok, but was still reporting clearer onsets (as opposed to shifts in form) depending on the type of material used.
@threshold seemed to kind of be all-or-nothing as well, with a lot of activity around 0.2, and then >0.4, getting nothing really.
(as an aside the helpfile/refence says that @threshold should be a value between 0-1, but the @attribute is not clamped at all. Is that intentional?)
Latency isn’t super critical with what I have in mind, but keeping a smallish hopsize would be good in that something along the lines of 300-500ms latency would be passable for my use case.
I vaguely remember from @weefuzzy’s talk, that he did a whole load of downsampling (the bane of me at the moment!) and iteration to “roughly” and “vaguely” tell where changes in material happened. Is there a FluCoMa-able approximation of this idea/approach possible?
I’m attaching an example audio file to show the kinds of sound I’m working with, but I (musically) think there are around 3-4 types of material in here that I’d like to be able to differentiate between. I’ve mainly tested with @feature 0, as that seemed to work best in practice, although I thought @feature 1 would be better for this kind of noisy material.
autothresh (check the examples for an iterative way of doing that)
now, spectral novelty is very likely, at small resolution, to be attacks, so you need to blur time resolution and swallow them, hence proposal 1 and 2 above. as for thresh, it is context dependant, so autothresh is a good idea to explore
I’ll investigate the crashes of the other thread as we did not get them with our test material. Non-modulatable is easy to implement as it is everywhere else, so it might either be a real bug or an omission.
yes, so the plan is for you to find with super large windows in large file where large changes happen. then, rerun on that chunk only with whatever other parameter.
for instance:
You could do an ampseg looking for minimum up time of 1sec and min silences of 500ms on a 10 minutes file. you get large chunks
then you take those and do fine tuning.
how sexy and flexible is that?
Unless you mean using that process to figure out what a suitable thresh might be, but it still doesn’t help with the fft/kernel/filter sizes, which are probably the critical things here.
Oh, I forgot to ask about this in my previous response. Would that then mean I also want to look at different @features as well, particularly if downsampled?
I should have guessed. then you need to spy on your ears and brains: how do you actually segment in real-time? The answer is you don’t. you just remember and make sense of the past. So having some sort of past that is analysed bluntly to set automagically some threshold is something you can now, thanks to flucoma, explore
segmenting on amplitude change, and on MFCCs, would be the 2 I’d choose, for loudness and spectra, respectively.
By this do you mean keeping a running buffer and running ‘offline’ analysis (in lieu of the real-time version), JIT style? Or is that what having larger FFT sizes enables? Or downsampling? Or both?
As in using either, or using two instances, one with each @feature?
I imagine wanting to know about the past, and if I’m different enough to be elsewhere or back. That really depends on your definition of past and section, so you need to experiment. but let say I want to know every 10 seconds where I am, I’d analyse at that frequency and get a reading of the past 30 and see if there in one in the last 10… I have not done anything like this but this is all possible. So I go back to my hint:
I’m testing it with smaller fft sizes for now, since it’s quicker to hear the changes.
Also, what units is @filtersize in? Samples, bins? More specifically, if I’m working at extreme downsampling, should I adjust my expectations with the @filtersize accordingly?
IIRC: kernel size is the window of time (in hops) that the difference is computed
filtersize is in hop too, and is a smoothing of the novelty curve (because @weefuzzy said famously that it is jittery as ***)