Mfcc comparison

tutschku · May 14, 2020, 8:47pm

I have been working over the past days to compare mfcc data and try to use them to find nearest neighbors with fluid.dataset.

I compare the fluid.mfcc with the results I get from Carmine Cella’s mfccs (latest available distribution of orchidea) and I must say, his lead to better results (I’m using both with comparable FFT settings and 20 components).
In order to explain what I observe, I made a 5 minute video, I’m also going the share with Carmine.

https://vimeo.com/user65354774/review/418616716/70a0c3d423

With flucoma, I’m using median average over the entire sound. I compare it with Carmines averaged mfcc 20. I also compare it to my division of the sound into a few chunks (right now actually 6, compared to the 4 from a few days ago)
You will see that method in the video on the upper right.

But after a lot of testing I find that Carmine’s ‘static’ mfcc, which does somehow take the energy into account, leads to even better searches of neighbors, than splitting the sound into chunks.

As I’m not sure if FloComa is actively talking to Carmine, I just send this ‘observation’ to both of you and look forward to any thoughts, input etc.

Best, Hans

rodrigo.constanzo · May 15, 2020, 12:05am

That’s interesting. Quite stark difference there.

I guess MFCC implementation can vary a lot (much like how Pitch, as a metric, can have loads of ways of arriving to it).

Curious to hear the responses.

tremblap · May 15, 2020, 8:00am

Wow, it does sound nearer in the sustain part indeed, but there is a few unknown in both implementations, yours and theirs. I’m sure @weefuzzy and @groma will chime in too. But the ‘flucoma mfcc’ is just a little part of the process you are using here, as you can see:

what are you doing in the processing of the time series of MFCCs - averaging, mean, std, min, max, median, derivatives, etc. The ‘best’ result you get from flucoma is when you actually shrink time in 4 slots, which would be uneven because you divid in 4. what is happening in the other 4 cases is not clear from the video explanation.
have you tried normalisation and standardisation with the flucoma tools? that will give you different results for sure… above the stats I just talked about just here.
what proximity metrics do they use? I wonder how they find the nearest, it might be a different way and your patch is quite closed so I cannot see what database/query you are using…
which mfcc implementation they use? Are they also normalising/standardising the data?

Things that I like about this experiment the most is the diversity of results. Each of the 5 methods give you consistent results, each in their own space. Considering all the questions above, I find fascinating all the experiments you can do, but also how all these digital debris are useful compositionally.

Let us know what you find out. Questions #1 and #2 are definitely possible to explore with what you have in hands, and as I said before, the time and scale manipulations are far from trivial and will give you very different results. If reverse engineering of a blackbox is a motivation for you, as it was for @rodrigo.constanzo and his drum analysis, this is great! If you find answers to questions #3 and #4 that might give us interesting insights as this might be trivial to implement, or already there…

tremblap · May 15, 2020, 8:30am

Also, he could join here to discuss if he wants. This is a cooperative community of like-minded creative coders trying to get questions he has at his heart, so that should be fun!

spluta · May 15, 2020, 4:47pm

Hans. I covet your 12 channel home studio.

Is there any chance that breaking the file up the way FluCoMa does is introducing an amplitude onset artifact in the data? Far fetched idea, but that is what it “sounds” like.

tutschku · May 27, 2020, 9:29pm

Here is the next step in my mfcc comparison.
In order to better understand which descriptors to use to build datasets, I’m comparing the fluid mfcc (20 components) with the orchidea mfcc, equally with 20 components.

Here is a video which runs you through the patch.

You need bach and dada, as well as the orchidea distribution from February 2020.
bach and dada are accessible through the package manager.
Orchidea: http://www.orch-idea.org

This is a zipped bach project.

Hope you get it to work.

Thanks, Hans

tutschku · May 27, 2020, 9:31pm

mfcc-comparison-project.zip (81.1 KB)

tutschku · May 28, 2020, 12:34am

and this is a link to the small Pianoteq sound set

weefuzzy · May 28, 2020, 11:09am

Thanks @tutschku I’ll get myself set up with Orchidea and have a look at what’s going on, probably as a first step seeing if the distribution of our MFCCs is noticeably different to theirs and, if so, seeing if I can figure out why…

tremblap · May 28, 2020, 1:11pm

What would be good @tutschku is to see if Camine would tell you if they still use ircam descriptors under the hood. That would simplify our life since I would just do that… I know they used to but maybe not anymore…

tremblap · May 28, 2020, 1:14pm

I’m asking because i downloaded the command line tools, and the ‘feature’ program says it outputs ‘spectral centroid, spread, skewness and kurtosis’

tremblap · May 28, 2020, 1:44pm

ok here is the bit of code I needed - it extracts MFCCs and prints them to the console. @weefuzzy will enjoy that. It needs Bach and one orchidea external (which I pinched from the package)


----------begin_max5_patcher----------
3998.3oc6ct0iiZjE.94d9Uf7SIq5om5NvlWlruj8kHsRqztZUR1QX6pcyrX
vBimKIJyu8st.z1zfArgp6j4DoziaaCUcN024JP0+1qtYwxrOI2uv6u58Sd2
byu8patw7V523lxe+lEai9zpjn8lu1hkGJJxRWbq8ihWadyrku+0D+p2L8v1
3zDYg4.vku49hOmHMe4i9ZYGJZ981EUr5g3zMuKWtpvNwvbxcna8nh.8+PXU
+z6WJOF6oo3y6j1CXwxnzMK79E8m96u5U5eb6.EuT4GUBS0TrP9IybXwt73z
h1DZxUIzntEZA4N9sdbZnQzsxLQKyWfPsUteezF4Sjp27uxRNn9v2rMRO76e
yt7r2qlZuQoE1euTt9M62Fkjnly2s89Uqta8xVTAXQqp.x0utS7EZwlQ3Fc.
E+nRnqE9KaQuSlVLWHM0WKKXqj8rPzY6joqiiRx13ceVx5VDd5UI7jdEd6pa
HY.qp2dMJfhrMaRjsY7dVAbWTdzVYgL+cxznkVYEc8q7LgY0lhYCcoW644hD
76SxTSqVDb1EK32mkuMxHIhqmC3kVz9FdfilaPnCKgUOnNisxG3qxBf1KDP7
MvOKbXRtIDzUvCcH+4pAS5s7v9O6o7+uIWErnMkApUkAc5TFLCMfwhgoML++
zqGVuLUQ+sEmidUw450eHgR0+S.afh+EI5qx1tU9XlL0x9do1pWiAe166ZS3
wgyUhNXrISGVnI6NeaNdnKLSmtVZkajeZm2O8e+4e969k+Rqx20Epm22pak3
gFAbe6brRu5Pg28w46K7RhSakywrYaolDdTBck5hIdodoJU16RKdv6a9Ft2q
we621pHhlqbVKWssw0C4CJi0pQM5Cx0uSM+Tmr2EUTjGqxK0Vb1M05faV7AY
993rT0LZoLW+gAHLhX9PiFbxp3QpyPzKWF0VxgD17lZL0RGAn4Jm+ygOwpje
7dq5quVtSQRXu2pFTu1p+6JAIZufjM1H1mLpfhuHIJipUiSduM5PQlGtUKy4
snCZnvljw3p53Eo9LKWIiqkQpxxuaiL06seLNcc1G2G+qROFJT381Gx1Y9MB
hE38V07YcrJlfdtsW8ddu8dYTwgbomt391p9cZp.bUVRVtUmhtCio9DU3dzc
7.DMjneku9sTuG9nUitSSBoNf.gefx3HPbI4JY9tKzQ+Zz3KyXZhJdxZw9rC
4qpNc0IK3QpGy0x8EwoQEJ85QeKcqg75Jz1PGJcS07P8LRZilqdjzo21+HoE
b7iJ670V.GM+CsNfyQeo5gFeUCsX.iLcBztCQBESw.QvCQWNECDZHCzjPlnA
YCHlhgBOngBMIC0fDKzTsTg6YjXSw.wbk2Jwf7VQmpQpOkG4oKS1nLQ61UlM
fYXzChJ196swC8u07qwo1e0jLnpT0ODW88MUUtHRGmuPEFTEp1Dt8SBax2K1
lobBldHtzOnR7dUYX4FQ1LszPene+g0wY+yBUX+8u6GkoGrQtUR08QGRJNUU
rby8wII0QuONCmp3qkw1us7sOIROIjgwA536TcLeg4UUQ5qzmkGCt5fXbVHx
jT.SP7YlDEXADJmpeEpwgQdbrP3P6HfBCPL6qTuE9jrJT5xzM1bVHl5KJe2c
4Y6xxqVSUGZX82Wkn5l7n0w5RoqBzUmE2skjTt5SalMzhsJEZb0wbL4UuX72
iR2iPkcgdwioK93RfNWOij9t3TMGIqDXzsG8iiyaZ4lSVDB7wLrQeFx4HrQw
HHgBB4zCauLQAXJEvQGMt7be7nbrbW1W8FLyoCuNEOyvGDpesYhTtBc7vexA
0oroli5VZbxWVvUmNiTo+Oe6qrjySlvkWlmIeB6GvnVUKlUMcPJlkD1xjnOZ
A2Es7i+HB+t+wg7cUWKi1sZqQllKh3lpyNfqybD09LarDbaieFD1EOhopB.H
0Kvj.k49o1nKhVsp4hrOkagXAGaWiCwgDL+rqIj.rY4yWU7EGaNnvPh9v6io
XLJJvLLjPJAYNVetfEhaZq0uCxZmGs6uSUYF5Q0AGyBIM7Xcpit90eOwK7D6
YrzAX80CJs39np.lOpZzuutzyRjB0iqFsfoLWDO5ugnz1Cw7oxFXn9be3Det
m2B5Rc5JDLlv5pkpTzVseP.1m1OoqhY5aO.tHPXi5w7Q7vNY8yXud4.5nwmS
w5f5f2Ap4dPf8UZ2jmkt6yY9TGEucXcztr6C1dMFvM.2bHtA.G.bNF3.jCPt
tQNz7fb.zAP2j0MlFjEvV.a8L3PC.O.7lSmZ.eA70yiiMnma.18LfcT.6.ry
0XG3qCfNmCcfmN.5bOzAkS.Xm6wNF.c.zMC8HARbC3JG6LCRaCPNGibPzS.4
linmbfq.txs21Qbn9S.5bMzI.jCPNWlxFTVJ.bN0GG7LK.H2bTi.7vI.j0yg
yL.6.ralbnA4lAb0bvUv0E.3p4fqfl+Cb0zyUf2JfpldpB7UATkSazJbELAf
yo8l.t9k.x4XjCt5k.vM4opg.nBfJG5EChZB71j6DCZk+P22RmY7XbDri88.
TB35wk3FTgH.bNF3.jCPNW13U3NzAft4ImdXmBEXqmOGZ.3Af2r4TCpCEHq4
grfalGfrlCxB7XAb0bvUf+JfqlCtBt8CAtZN3J3tLD3JG2Pe39LDfNmCcvcZ
HfbNsoqPAl.v4TebvMmAfbyQMBv8fAPVOGNy.rCvtYxgFjaFvUyAWAM+G3p4
fqfl+Cb0zyUf2JfpbZR8veft.jaVbjAkJBj0yfyL.6.ralbnAcn.3p4fqfb9
AtZN3JnCE.WM8bEPU.U4zj5YPB8.x416SLFTEI.cOC94.rCvN26qC5qAfbNF
4fVd.HmiQNntU.4bJxAOEm.v4VfCJd.PtYniubnrTfrbuyLnrT.4bLxAkkBH
miQNnrT.4bJxAa5F.v4TfyG.N.3b4ER0GpVEPN2hbA.vA.2D23MnXTfobnSL
3AAE.NmVY.7DhBHm68wAPG.ct1OGbkr.fyo.GT5..bNE3fqhE.bND3.bCvMG
V1.bm7B3lCwM3JzC3lCClBWed.2bn2M3RkB3lCwsP.2.byUASq56VySSILN1
yFbkJ.30gv6QWZrICfgK2F.wtFhgK2F.bNE3fK2F.bND3HSbP5oNqU1De93S
54qZ1kHWepgZSyNlPYuYPMB122fOAbFOj1osixfSCkFrFypnODgxIML3jIxs
MMV6CWOlgVdnnvBlcN+CnBt01ODolS1yLihHFYRz97Gol9TyPSEJ+DlCGEp7
hDzxjXU11skl.ONKVkHixa5OQMaPZUi9UDkALwnhXAmd9TVUkFhGc5Fjmo1T
mTBmYDUZ8JA0u58XzSG5zr0x8mNxWqYLtGLrNu3VUi8M5GTGxgwcHoG1tTlO
xiQ9wrkuebGyV498QajtVaVqOACavv940v9HXDLtmBMJAzlSn1jBZyITax.s
4joMqxzus46UWGcP.BKLAI74bpsLTlONTWF5YqitLRmoVZSUn3PBl6e5QcQp
yKQltzZvqqll5qx..a6hPURMlePncUI84paGcGWz33Fc49NtBbsZWtNtv0.V
e4Dcg.6oI500jczoRgECHqvw5P3zZvGpegp6HpVbWN9UsSz.breHWbMVa1Os
iFWMrUvwXt8zCaj003bqMa.quLwFa3.NGSLtsDTepv9JjfXdEm2ePSmEPoME
T4GsWYUtRqyGWwlGKZxecczpuL8NR5vx33gdSTb5Wdtcg0YzL1oS18Iwqah.
yyr83QM6PQhrnyReM8zmX6oOqt9YTYWtexYKN87mLJVqMLsAP4DGYLOBKUP3
gEE7xbnyGoC8p64Pk60BYtx4xtG5AjprCBJaZgZMRna8f4UrNVZzW3.9o9Q4
1va3dMBMKobp9+rQC0mrNTiCoqMWZ1mC9xebAMKgzLG0hm3ntgXnHLLyZtoV
GnFBiREDcGkdhpYc7ph69Pr7imWynOiVTkyQVKXAITPHsbJyGfYbHUnTw1Sn
8D6SD75Y3YTzyjexc4x8MsaaY3OyZ04tXcSfuRbitlseU1N4WNmil5KrHRyl
3wzNzNmEM6cmNkCUhGYG1clYx.T+aiJxi+zph7jwedNibTuHww5kIqyELWzZ
eH2kD84kQ4mYBfB4r.iZAiPkvm5Uj5z25oslmRxjPhs3rPjfxsYEF3y88Gfb
ozCjRekg9HaWkohPexPbaxpim4WEAhTFK5otNiVdtHipDdQAiYEYTEfsNNpA
OLtzK3CItnXjwEC55ZXb44G19DyejSr481n8JBNdd61gkW4DzTkK7VTXXt7e
gzVkqKOo+Djp2.xOpWmmP9QW30aTO7UqwAU4b3qzx1zQ+idJWcUFw3lz5kx+
LmJ2DFW7qvrB6s8.WSVgekmXWO20Yg02Q.PbPHNHDGDhCBwAg3f+QNN3nC.R
f.fP.PH.HD.DB.BA.+ZL.HEB.BA.g.fP.PH.HD.7qw.fLH.HD.DB.BA.g.fP
.vuFC.xg.fP.PH.HD.DB.BA.+ZL.n.B.BA.g.fP.PH.HD.7qs.fXzz9PzzTD
vAlkSFJjZiOxTVv1crD9ftoVe7RT1u.89cQEqdPliPcsmQO3S0+yFl5e+Pbg
7bZnd1xvlj0oj3MOT458LSklOUrnPkIgAXBUJe+xW4GLncGtJqOdseaZWOy1
WrbsM5ShkYeZFeJtZrjnDmyl8fIx8iOqfk4oMsBr8wkeXDk9og1Fb33.nkAA
m1oU0VAzDuVLAOCaUaOgO8Um4YXyOHPo4ts0W8B5gX6bKemZcLrUwxc1ggRW
1sPpZaclRlogSJYo+FO5M9E2Cp4v1Mlu5sbmNBkO861NuL2bounMootJ23Ey
1J2ksUk7jDOur81jvQaB95+naDNCaGObNXhCl3u3LwGrssUQ72RNHOu0casn
YL6JSW4D7GxkxzQOCGZOOuxI2+Qljj8wQO6BHbruMqRrs8GHdHuoOqFq7D8W
BgBlJAvrAx8CYIqG8z2udC2lWtugp6ZyQNeljIXV96iR8rG3bF7IzODa2XYD
TjHvxxXx7D4g5yr0ggn7.6NcNhJXrAzjnPM0XlcXNtri4Btn4l79yzeSS5dK
OuYnqAuwh3h5jFLMZ63dOUiz1t4xsG8ioy3X2g7cWncge81bKCq+anysGcsG
NiYBEwJ2P9IbjHz1Lw.tfDLkxUtbkL9Cx8i0gZMI8nLg55uHAW9jy1VOsOyQ
zJsd8V1lihd25SmDAZuLc89WZQVKh1WHu+PRR4FJ7v0zmaGTsSMcX8EHRT62
S2Ae+f4PnFsLQTtFs+YthVqyojPZXv0HSO6cu8i80ZbcxOUe6exaw2matrGU
S7osyt8DnmT2TWRstgTFvRaTeRBa2GspZxblD4tDsn5X98W8+ALrx1zB
-----------end_max5_patcher-----------

tremblap · May 28, 2020, 1:54pm

So @tutschku now I can confirm there are many unknowns just from the outset, as I was pointing in the first message. The most important one is that:

for fluid.bufmfcc you get a time series of MFCCs that you then need to decide how you process them statistically for the whole sound and then can normalise/standardize yourself. all that is left to you to decide and experiment with.
in orchidea.db.gen each file gets one value, one window, which mean that someone somewhere has decided on how to deal with the time series… and that can happen before and after normalisation of the data itself.

Since they do get amazing results indeed, we would either need to:

spend a lot of time trying to reverse engineer 2 unknowns…
get the information from the orchidea team on what they did.

I know @danieleghisi worked on this project, so he may have some interesting insights…

tremblap · May 28, 2020, 2:45pm

Another question for @danieleghisi - I wonder how the distances are measured in the dada query of the nearest ones…

edit - I saw @tutschku actually uses fluid.kdtree so that bit I know

tutschku · May 28, 2020, 3:00pm

I asked those questions to Carmine a while ago. The only detail he gave in an email:

There are several ways to compute the MFCC that depends on which version of the dct you use. I DCT-II and, as told you, I perform an energy based weighted average of the coefficients.
PS: my computation is compatible with Dan Ellis’ version and I checked this with him several years ago.

He takes thus the amplitudes into account, but I don’t know how.

I would suggest that you reach out to him directly as those details (Dan Ellis’ implementation etc.) are outside of my expertise.

tutschku · May 28, 2020, 3:04pm

I understand that part. For now I’m just comparing his blackboxes results with a median average over the fluid.mfcc

tremblap · May 28, 2020, 3:19pm

Indeed but that is a shot in the dark, as what he does inside that blackbox is definitely where the difference is - and might have very little to do with MFCCs computation… hence my questions. Do you know if he is doing median or which other time-series statistical analysis they do?

weefuzzy · May 28, 2020, 3:30pm

I think what @tutschku quoted from Carmine gives us something to go on

They use DCT II. I can’t remember offhand which one we use, but IIRC we followed Essentia. Sounds like they followed librosa if they’re talking about Dan Ellis (or at least its default: librosa provides different options
I can sort of guess what an energy weighted average is, and I think it tells us pretty much how they’re getting their summary statistic.

IOW, I think we’ve got something we can test empirically in all that (and I’d be very surprised if they weren’t normalizing the coefficients at some point, but it should be pretty easy to determine).

tremblap · May 28, 2020, 3:33pm

another thing we don’t know from their blackbox is now many melbands they use at first… so many variables!

weefuzzy · May 28, 2020, 3:37pm

We can use librosa’s defaults as a starting hypothesis. @tutschku has already opened a dialogue between Carmine and I, so if we get stuck I can trouble him for further details.