Hi to everyone, I’m quite new to the library, so I hope not to ask a trivial question or about an already discussed topic . What I’m trying to achieve here is to analyse in Max an audio file of about 3-4 minutes with fluid.bufspectralshape~ and having as result the time series of the mean for each descriptor over one second segments. So for every second of the original file I receive a single array of values corresponding to the mean of the features over that second. I’d like to have some suggestion about best strategy that allows me some flexibility, in order to eventually change the duration of the chunks and also to select a subset of features. I’m not sure if would be better to perform the analysis on the entire file and afterward calculate the mean over the analysis buffer, or otherwise to slice the audio before the analysis. In my patch I don’t need the audio slices to be played, but only analysed.
Thanks in advance
Matteo
Hello @mattebass
There are 3 approaches to doing it, and each will give a different result, but not that much. This is the strength of creative coding
In order of complexity:
- you run fluid.bufspectralshape~ on each slice of one second, using the
@startframe attribute. You can do this by dividing the duration of the original buffer. Don’t forget FluCoMa works in samples here. you then do the stats of each data you get (you will get 1 second worth of values, which you process with fluid.bufstats~)
- you can run the descriptor analysis over the whole audio file in one go, then iteratively do the fluid.bufstats~ on one second worth of descriptors. FluCoMa works in frames again, so the sampling rate of the descriptor buffer will help you see how many frames you need for 1 second. (We do the maths for you in defining the destination sampling rate). then you use bufstats~
@startframe the same way.
- you make the framesize and hopsize of the descriptor object to be one frame per second. A huge fft will make an ‘average’ - albeit influenced by the windowing so not super elegant but quick and dirty

I hope this helps!
…as for selection, you can select the channel of the right descriptor post-analysis. Beware that stereo signal will generate one pack of 7 channels per channel. It is shown in the helpfile, but it is worth remembering. If you want to make it mono before analysis, bufcompose is your friend
Thank you for the suggestions, I succeeded in the analysis
1 Like
As a further request about a similar topic: In the scenario of a live audio stream, what would be the best strategy? Here I’d like to have spectral features calculated on the incoming audio and to compute a normalised mean over 1 second of the signal. It would be better to save successive chunks of audio and perform the analysis on them, for example with fluid.bufspectralshape~, or take a mean over 1 second of the data produced by fluid.spectralshape~?
My main goal is to compute a sort of derivative of the features values and make a comparison with a threshold in order to account for spectra change during time.
Thanks
Matteo
I am struggling to understand ‘normalised mean’ if you want to compare between values (since they would all be normalised!) but the approach I’d take (if a comparison between seconds is what you want) is to use the RT value (beware of duplicates - the objects spit out one list per control rate) and do the sums with zl.
to threshold on spectral feature differential, you could detrend too. having 2 fluid.stats with different number of frames (one small/fast one large/slow) would enable to see how a local change is different to its surrounding (in time) - it works in multiple dimensions to
fluid.spectralshape → zl.change → fluid.stats 10 = x
→ fluid.stats 100 = y
(x - y).min(0) will give you an array of the (positive) difference. Remove the min(0) if you want negative peaks too.
1 Like
Thanks for your answer.
The idea of normalisation is due to the fact that I’d like to use a single number for the threshold comparison, so I was thinking to have “two axis” for the mean calculation:
- A mean of all the features for every time frame. To calculate this I need a normalisation for each feature;
- The mean over time of the mean as defined in the previous point;
In relation to your answer, what is control rate of these calculations, is the rate of every FFT frame or they otherwise return values at the audio sample rate? Is it possible to know exactly the rate data spitting of the objects? (is there a guide or a mention in the help?)
Thanks a lot
Matteo
I’ll let you have fun in the weeds of normalisation. There is a lot to explore there for sure.
As for the rate; all control rate objects in FluCoMa spit at every environment’s control rate (in max it is the signal vector size). That doesn’t mean that it has new data, but it will repeat what it has. It is the less worse option we found, since it will alway have to happen at that quantisation level for time critical things (now we can start talking about threading in Max but that is a much longer discussion for elsewhere)
As for determining when the data is new and valid, the user can decide: it is every hop size, like any windowed process - with the caveat that if you use a hop size that is not an integer multiplier of the environment’s control rate, then you add some additional calculations to know where it is.
(the answer is not in any helpfile because it actually needs a tutorial on threading and ‘where is my data’ and latency - the said tutorial is brewing in my head for the last 4 years - for beginners, we put a simple zl.change when it matters which is rarely)
Sorry for that reply. It is complicated because it can be very powerful (to decide whatever hop you care about for instance)
I hope this helps a bit.
An attempt at a clearer, friendly, concrete answer:
with the default fft values (1024 -1) the hop size is -1 which means half the window size which means 512. With default Max values (64 svs) you get a new value every 8 lists that are spit out (512 / 64 = 8)
the likeliness of 2 valid new calculations being exactly the same (apart on digital silence) is almost null, hence the zl.change trick working fine for most beginners.