Spectralshape rolloff (5) vs rolloff (95)

So down in London with Jordie Shier (who did a talk at Hudds a bit ago) working on some genetic algorithm timbre analogy stuff, and in testing different descriptor types/subsets/settings he suggested this use of using @rolloffpercent in an inverted way where you would specifically give it a low value, and in effect get a number at which most of the energy was above.

Here’s a vid explaining/showing what I mean:

I guess the rolloff wording in the @rolloffpercent implies it being used at the upper end of the spectrum, but in looking at the underlying code in this older thread on centroid vs rolloff the wording is more of a percentile thing.

Is there a name (or existing descriptor type/moment) for this?

As an aside, I only show it with the ConstanzoPreparedSnare example in the vid, but it works really well across a wide range of material where centroid/spread don’t really capture that broad of a range for where most of the energy is.

2 Likes

for me, it is more a way to describe a low-pass filter. that way you know that the number you get is the energy below, so 5% means 5% is below, and you understood, 95% is above.

actually, dual peak data is the hard bit indeed, but this is where the other order (skew and kurtosis) help, no?

Just to add: these are centiles, like in bufstats min and max that can be used as such. so maybe getting rolloff 50 (as the median) against the mean (centroid) could be fun as a single value. But again, even if they are counterintuitive, the 4 moments of stats you get here (centroid(mean) spread (standard deviation) skewness and kurtosis) are there to help with this.

Now I have to go and check why you don’t use log centroid (much more musically useful)

Yeah indeed, that was the initial use case for this genetic crawling stuff, as it avoided the algorithm getting sucked into low frequency nonsense.

Using all of them in general, it’s just easy to visualize these 4 against each other. The idea would be to have/use 8 spectral moments rather than 7, with @rolloffpercent 5 being added as well.

It absolutely is the case, and that’s what’s happening here. The abstraction wrapper (dk.spectralshape~) sets @power 1, @unit 1 as a default underneath.

On that note, we did some listening tests yesterday and found really inteersting phenomena when you don’t use power/log, as well as the impact on loudness-weighting. Basically without power/log some drum hits that you would describe as going up in brightness were reporting a lower centroid, so the way the mean was being computed in a linear domain impacted that in a quite perceptually “wrong” way.

I think Jordie filmed an example of this, so if I can get it from his phone I’ll post it here as well.

At the moment, mainly wondering what to call @rolloffpercent 5 when spitting out a list of 8.

I’d use both values of rolloff and call them what they are (energyPercentiles) - you can make a list (5 95, or even 5 50 95 to get the median…

1 Like

That’s the ticket!

1 Like

Is it possible to compute multiple versions of rolloff without having to have parallel fluid.spectralshape~s (fluid.bufstft~)?

Doing one for normal @rolloffpercent 95 and a second one for @rolloffpercent 5 isn’t so bad, but having to add a third one for @rolloffpercent 50 starts to feel grossly inefficient.

(looking at some viz like above including the median, and running some MRMR analysis shows that median can actually be quite a solid feature (in addition to centroid) shows that it’s worth including in a more general analysis context)

not really - and that would imply a major interface change for the object so not likely to happen.

It is less efficient indeed, but still quite insignificant compare to the other things you do in there…

1 Like

Ok, good to know.

Maybe I’ll leave the median out then, but the low seems to do a really good job (even just using low/centroid/high) of classifying different categories of sounds on the drum (e.g. head vs crotale). From some early tests it may be better overall vs MFCCs, but still working things out.

There’s a peculiar phenomenon that we (Jordie and I) are seeing where coefficients 7-9 and their related stats tend to be super noisy, regardless of the type of material, and regardless of the window size (up to 250ms). It could just be intrinsic to MFCC fragility, but given stuff like that, spectralshape may fair better overall in some of the more difficult classification stuff (center vs edge of drum head).

also there is a (long waiting, still pending, potentially eventually happening one day if I find funding) idea of all descriptors being able to be fed fftframes. This is the PR that is now out of date and would require, to get to our level of interface QC, significant work on that front. So do not hold your breath :slight_smile:

2 Likes