Temporal similarity measurement/classifiers in the future?

Hi everyone,

Just wondering if there are any suggestions about how to measure temporal similarity with FluCoMa tools in Max — e.g., measuring similarity between two MFCC sequences of different length. I was hoping to find a fluid.dtw~ or similar. Are there any such tools, or is it in the works for future releases?





This is a deep question to answer. As far as objects go, there is nothing that explicitly analyses the evolution of sounds or time series directly. However, one might imagine that this is something musicians are interested in :joy:

One approach that is fully enabled by the toolkit is to analyse the statistical change across MFCC coefficients and their derivatives. This way, no matter the length of the sound you get a uniformly shaped set of data that ideally describes how it varies over time. There are issues with this approach though – derivatives get noisier and noisier and certain shapes which are remarkably different (think linear ramp up and linear ramp down) preset the same derivative information in some aspects.

A patch to capture stats

temporal-comparisons.maxpat (13.9 KB)

Another hypothesis for capturing temporal features is to create data that has time dependencies. I’m absorbing this from my ninja colleague @weefuzzy who does this thing called shingling. Essentially you take a time series of data and clump it into small windows and use this to train a model elsewhere like a fluid.mlpregressor~ or maybe to feed into fluid.umap~.

There is currently on object in the works for doing this but its not finalised yet but the principle is somewhat straightforward and can be done with some max-fu 🥷.

If you have a time series 0 1 2 3 4 5 6 you would create a suite of new time series to describe the temporal dependency of your sound. Those time series would describe how blocks of numbers progress… like:

0 1 2
1 2 3
2 3 4
3 4 5
4 5 6

zl.iter is your friend for this job!

P.S Do let me know if those patches don’t work as we might not be on the same build right now… I can help in that respect too

Hi James,

Thanks for such a detailed answer — both of the approaches you describe are definitely worth exploring and might work for what I’m trying to do.

Since I’m just getting used to the FluCoMa package and workflow, I just wanted to make sure I understand the derivative computation in fluid.bufstats~ — please correct me if I’m wrong with the following assumptions:

  • when asking for stat derivatives, the output values are the statistical moments (mean, stddev, etc.) of those derivatives
  • the @numderivs specifies the max. nth-order derivative of the data — e.g., @numderivs 3 would compute the moments for the 1st, 2nd, and 3rd order derivatives.

Is that correct? In any case, I’ll have to try it and see how convincing the results are.

As for the temporal-dependency approach, I’m unsure if I understand the training process. If it was an issue of classification, I’m guessing it’d consist of feeding labels along with the training time-series data, but for measuring similarity between 2 random time-series with no prior information, how would it work?

It’s also not clear to me how this would work for multidimensional series, even when reducing the data to 2-dims. Not in terms of handling nested list structures, because the bach library makes that very easy, but in terms of doing the computation, which would send me again in the dtw direction.

Anyways, would love to get your thoughts when you have time, and thanks again.



I will spell it out the way I understand it so that I don’t confuse anyone who comes to read this later or myself for that matter!

Lets say we have time series:

1 2 3 4 5

And we run fluid.bufstats~ @numderivs 0 (the default). We get seven new values describing the mean, stddev etc. etc.

If we run fluid.bufstats~ @numderivs 1 it would run the same process but on the original time series and the first order of difference.

So you would get 7 statistics for 1 2 3 4 5 as well as 7 statistics for 1 1 1 1 which is the difference between consecutive samples.

Does that clarify what it is doing? I know for myself I was confused and thinking through it in this way clears it up.

Yep exactly, so in the aforementioned shingled time series:

0 1 2
1 2 3
2 3 4
3 4 5
4 5 6

you could do something like associating each of the outputs to the same label. So instead of just doing:

0 1 2 3 4 5 6 -> "cowbell"

it’d be more like:

0 1 2 ->  "cowbell"
1 2 3 ->  "cowbell"
2 3 4 ->  "cowbell"
3 4 5 ->  "cowbell"
4 5 6 ->  "cowbell"

The thinking here being that the samples have some kind of temporal dependency on each other, and thus their characteristics are modelled by showing the computer multiple examples of that.

You could calculate the distance between the statistical analysis as vectors. This is based on the assumption that more similar time series have similar statistical analysis, and this can be inferred/modelled by calculating the euclidean distance between them. Now, we can make our lives much easier by leveraging fluid.kdtree~ to build a k-d tree for us and we can query that (caveats will come later).

If you want to see how far something is, there is the knearestdist message which returns the distance from something to another thing in the kdtree~. With both of these bits of information you can retrieve the closest thing and evaluate how far it is. That allows you to make some value judgements about what the computer is doing as it is likely at some point it will return a closest match that is still in absolute terms far away. Of course, if data is unnormalised, for example our means go between 0. and 10. and our standard deviations go between -3 and 3 then each of those values will not contribute equally to the distance calculation internally in the tree. This is something to be aware of and to address later once this part makes sense.

This patch should help to clarify what I have said (in theory :joy:) and get you started on exploring for yourself.

stats-on-timeseries.maxpat (37.6 KB)

it is worth adding that for comparing objects, I really get good results with 3 arbitrary windows of time 0ms-50ms, 50-200, 200-500, giving them equal weight in the search. A quite undocumented example of that code is in the example folder… far from a mathematically rigorous approach, it bakes in some musical intuitions I had. @tutschku experimented with another arbitrary time bundling with a constraint on the whole size of the matching object…

for more real-time flux, 3 scales of running averages assembled also worked well. I think @hbrown did some good training with Wekinator and that approach.

Yes — that’s what I thought too, so that’s great.

Also thanks for taking the time to make the patch examples. The 2nd one (stats-on-timeseries) is how I’m used to doing things to compare arrays of equal length — for arrays of varying length I’ll have to try different combinations of the approaches you mentioned and see how it goes. One being the first you suggested, i.e, getting the mean of the 1st derivative for each MFCC coefficient sequence, and using a kdtree to compute similarity with new segments.

@tremblap — I’m very curious about your approach.

Do you mean running 3 analyses of the same type but with different window settings and computing an average of those 3? I’d also love to know how to find the example you’re referring to.



Perhaps it wasn’t clear but my example doesn’t directly compare the time series. It computes the stats of the time series and compares on that, so it will be size agnostic :slight_smile:

This is one of the best musical examples of how data mining can pay dividends. Worth checking out if you can dig through the code. We have plans to clean the example a bit and annotate all the patch pits as well as leverage many of the new objects for patching as it is quite old now relative to what is available :slight_smile:

Oh, yes — my bad. That’s great news then, it seems quite accurate. I’ll have to give it a try with a larger dataset.

Could you point me to the location of this example? I’d like to give it a try and see if I can understand it in its current version.

Thanks a lot!


it is in the example folder/dataset/2-various other examples/three-moments-LPTvsMFCC

It is very dirty - we are still in beta and curating stuff in there, that was personal research I shared with the alpha users :slight_smile: