Based on many recent discussions around AudioGuide and how things are implemented there I got to experimenting with linear regression on every frame of a loudness analysis. I go into a lot more detail in this post in timeness thread, but I wanted to make a separate thread to act as a feature request.

To summarize, I did some testing and comparing between the slope of the linear regression function (from this javascript implementation) and in terms of “accurately” representing the morphology inherent in a time-series, it appears to perform much better than the mean of the derivative (of loudness).

There are more examples in the other thread, but here are a couple just to show what I mean.

Pretty similar results here:

Here we see a bit more of the difference:

(edit: the results aren’t always this dramatic, turns out I had messed up some of the maths for the derivative, but they generally are better, and you get the benefit of a corresponding r2 value)

Although I didn’t implement in my Max version (yet), there is the r2 output, which can also act as a kind of “slope confidence”, which we also don’t get from the derivatives.

So it would be great to see linear regression as one of the statistics available in `fluid.bufstats~`.

(I’ve implemented it in Max already but it’s a bit faffy to code, not to mention the workflow problems of dumping the contents of a `buffer~` out into list-land, only to shove it all back into a `buffer~` and then onto a `fluid.dataset~`. And then having to do this per descriptor that you want the values for…)

1 Like

I’ll bring it to the table and see what the wisdom says…

1 Like

Ah turns out you can compute it on the dB itself too, which is even better. For some reason I assumed it had to be computed in a 0.0 to 1.0 range.

The resultant slopes are slightly different as one would imagine.

When I was working on a method for descriptor linear regression for sound segments, I remember that someone (Norbert Schnell IIRC) was adamant that regressions be calculated on only the ‘steady state’ portion of the sound, excluding the starting and ending descriptor values. FWIW, in practice, we ended up taking only descriptor values from 20%-80% of the array to perform the regression calculation. Also, FWIW, I haven’t ended up using these all that often, and can’t really comment on their usefulness for similarity calculations.

def descriptorSlope(descriptorArray):
# with Norbert Schnell
start = int((len(descriptorArray))*0.2) # only take descriptor from %20-%80
end = int((len(descriptorArray))*0.8)
values = descript[start:end]
if len(values) < 2: return 0.
cmpLine = np.linspace(0, 1, num=len(values))
return np.polyfit(cmpLine, values, 1)[0]

1 Like

While taking a peek the other day with @jamesbradbury we saw the 20-80% thing. Hence the experimenting with the option to drop the first frame in my examples. In my case this corresponds to 6ms of audio, so the slope is (ideally) giving some idea of how sharp the transient itself is, but I can totally see the usefulness for longer segments to look at the “steady” parts.

Yeah, omitting part of the array as a time in seconds (rather that a percentage) might make more sense here.

BTW, I cannot see some of the links you posted in your initial post (“Oops! That page doesn’t exist or is private”). Maybe the discoursebot is still getting to know me. However can I earn its trust?

2 Likes

you can if you ask nicely… you are part of the research team indirectly now it seems Try again.

1 Like

Or maybe a combination where it’s a fixed % unless the segment is above (or below) a certain duration, in which case it defaults to time values.

Or to get even fancy-pants-er can have it automagically find the “steady state” via a piecewise linear approximation, so if there are extreme(ish) trajectory changes at the edges, those get disregarded, and if there aren’t, it takes all the frames.

I guess @tremblap bumped you into the dev group, which is where some of those point (as it turns out).