The usefulness of higher order derivatives (on small time series)

Although I’m still into the idea of having linear regression as a statistic in the toolbox (primarily for the R2 value you can get out of the algorithm), @tremblap re-iterated the usefulness of derivatives for getting some kind of summary of the morphology of a sound, particularly if you take >1 derivative.

After speaking @jamesbradbury about this a couple of weeks ago, I guess, mathematically speaking, you can’t have more derivatives than the amount of analysis frames you have, minus one. So if I’m dealing with the tiny windows (7 frames spread across 256 samples) I’m dealing with, particularly with spectral frames where I ignore the outer 4 frames, I’m realistically looking at only 3 analysis frames.

So my fundamental question is, is there some kind of ballpark ratio for how many derivatives you can have for any given amount of frames? So, if I have 7 analysis frames, is two derivatives “too much”? If I’m trying to get the most out of the morphology should I even do three derivatives?

Obviously I can, and will, do some testing with this, but the numbers that come out of the derivatives are a bit abstract, particularly for higher order ones. So it’s not as straight forward as doing it and looking at the numbers, since I wouldn’t really have an idea of what I’m looking at.

A side/sub-question here is to do with edge cases and derivatives. As mentioned above, I have a 256 sample window which I’m analyzing with @fftsettings 256 64 512, which gives me 7 frames of analysis. For loudness, I analyze all 7, but for spectral shape, I do @startframe 3 @numframes 3, giving me only the inner three frames. In my early tests I thought this gave me a better representation of what was in the overall analysis window.

Now, I will probably revisit this if/once we have some way to weigh the spectral descriptors by loudness, since I could be more confident that nothing would be over-represented (perceptually speaking) BUT I’m thinking of folding in more of the spectral frames if it turns out that higher order derivatives are, in fact, useful for small sample sets. As in, even if the centroid, for example, gets pulled around by the edge cases, that having a better morphological representation would offset that.

So a little bit of thinking out loud, but a bit more of asking a theoretical question on the usefulness of higher order derivatives on small time series. (lol, typing that out felt like a song chorus since its also the thread title)

You’ll run into nans/infs really quickly with small datasets too.

Right, so if/once the numbers get crazy small, it’s no longer “useful”.

I’ll try doing some 2nd derivatives on the MFCCs and see what the numbers say.

1 Like

I haven’t dug deep but I’ve run into nans before at 2 derivs and so I just backed off to 1. For my purposes matching was just as good between 1 and 2.

Hoping to see improvements by going to 2 derivs, but more thinking about the implications for morphology. Like are the 1st and 2nd deriv of a descriptor meaningful in and of themselves as a 2d representation of the morphology of that descriptor over time.

The answer like always is depends. I think you wouldn’t be able to query that kind of information reasonably unless you got lucky finding one that worked well and the numbers start to become more useful once you have the sausage machine deal with them as a big matrix problem.

Second derivative is often useful, but word of caution: approximate derivatives like this are sensitive to noise in the signal and will get ‘worse’ (noise-wise) for each higher order derivative (and therefore less useful). Also, with the exeedinfly short signals you’re using, I’m dubious about how meaningful they’ll be (because we’re down to a single sample)

Sometimes: I mean you can probably regard the derivative of a descriptor as being a feature in its own right, but whether or not it’s meaningful (still) depends on the sounds, what you want to discriminate amongst etc.

1 Like

Indeed. That’s why I want to do some testing to see if the improved results I was getting were because of higher MFCC counts or higher derivs (or both).

It’s a bit faffy to manually do that, but thankfully I have a test/script that I can run these things through and get a number back as to how accurate it was.

Depending on the results, I may revisit your comments/thoughts about using raw melband representations, now that I can assess things in a more scientific manner.