Onset-based regression (JIT-MFCC example)

Just saw this, sorry. kNearestDist will be reporting the distances between the data point supplied and the k nearest in the tree. It’s diagnostically useful, both for getting an impression of how well described your data is by looking at the overall spread of distances of points to each other, but also practically useful, e.g for determining if the data point you have is a complete outlier relative to the reest of the data .

Indeed – it’s possible that MFCCs won’t be the most sensitive feature for capturing those distinctions. MFCCs (kind of) tell you about periodicities in the spectral shape of a signal frame, i.e how much wiggle there is at different scales in the spectrum. What I hear in those sounds though is a shift in resonances that might be better differentiated with something closer to the spectrum itself.

Yeah, the buffer flattening in particular is a bit gnarly to do in Max. Remedial work is underway! I might be tempted to start simpler for these sounds, and build it up. Replace the bufmfcc~ with a bufmelbands~ (maybe 40 bands to start) and use a smaller range of statistics, maybe just mean and standard deviation with no derivatives.

With pretty minimal testing, here’s that part of the patch, changed thusly, and a couple of comments added to the buffer flattening. You will also need to change the size of the buffer~ entry to 80 samples


----------begin_max5_patcher----------
3358.3oc0bs0iaaiE94Y9UPXjGZ2NYJI0EJt.K1zl1jEXS5VrIKxVDrXfrLs
G0QVRPhdtzhle6KuHIKIqqdjlwMO3vvKRG9wuygmC4Q42O+rEKitmkt.7WAe
Fb1Y+94mclpJYEmk8uOawV268BbSUcaQH6tnk+5hKzMwY2yUUGCDkRb83u7M
u98ubkK2MuK9qTcPLnWRn4UFtaazNd.iqdnnrZ0UweHloknEKcC2r.7+xZN1
k6cse3lqRXdbcOvWBu.PLj+hPlx+BKppXDhWieX82h5wvRxliYSxyVr1OfcK
KI0OJrTuOagabbopOqzPjHyuFodPNWTTkentJTQUIra8yGuQQstIhICWLS1k
nltKt21bw9GSzJVR3Ne0SRW4ebdtHoVMBc2xRic8zCVtnk27d3whpPFnBYrM
gx+xzAtGgDq1aBh7tgoVjf4UFEyB8CiSXorPtKOS1KZdEas6t.9UqiB4o9+l
RBPxkhFZeclD1Xix4fR7+tDe2fhIvlD+UQgRgnxJgr57W2mAHK0byp7jQ0iP
23FFrf+IfkVZLULI2ktzMQtPsLP8Fv4MxihBp1Tw3BXq4YMG6GFVCE4Qws2X
h+lq6XrKiDMtsqmspkzq1Epa8JAmfeUp6sUQataPPlJa0G+8tg9ac4LtudI.
CKZjE5JlnWm5kDEDTY9pa41FZYkfi6wtyeE+Z0KpLYPzc+3bRzhhU4U9aXo7
p0wc2jVslT9CZPuTU6VloCeEmsMNPLKp1gJF0JqvV13Vk5qYjyKZ6VAyuPcr
rkt6hRtAHrTAtSH.L.+ZF3+xuFj5JjDVJHZMv6ZWwJR.3W.gL1J.OBrIB3GJ
FTBXsPZARyif3H+pug.+PlWztPdY1WMKnVkGPUinvRszjkRaxkVByiVHkURG
k1igQIsmCMXlW+eb944EtXFPzOcsum..kFZR49do.WAtdGCD66ciXR72WzLV
X9XwBjsSIr.COEvh2H3Gbow2M.WvxcqWyRjTJI3jJ3PBtjacNzkfOIfK+f.v
0Bse.Q24K.tgq.h4vKWybk6wjlyKSAeE6xMWBPF.eQkBgIJTLdAG98u40uVL
vnDfAXs32XeAz8sAQ6VExRS+VOgPmD4u5xKu7qak3Zz7hkwQuXgv1pkGplvp
nuljSg0peZ21k5kG+PQ+jqYEPL3uIw9rlyWAZgHiOdrw.q2fG9rPjq3KXM+A
c8tA3C7aaJ2wbFUpkC7HbQkYVCPhi1KPRluN0cI7P.AOg.hvirT2MrFQDgVY
BWxO.u.0lEMq4BUxs9i0NLSdZgkN3IbvxNnIlNsCHFcAHp.Gt.rPsEq9uFFw
AgU+k1U4tQnmHMIgCdcqKgHsCR3t.oQCNXjRexFcx.NABOFVuS4sErMND8HU
pjO6gxZn5HrbdZAltL2.AsYkwFMu1dycqhps3bBXj4aZEKnNGIVLDklb2Lo1
Jj3zAOX2Gm.dgOB7Mfu5E9XveAf+51.n4diZScX7PxI1N0qSb2x.u.eAPD.K
ubUnK.5sWZFvfyLfQ0ZW5i+3jfNsNXmHn.QvJBmfiiRYeA7pTQrtdrrvVdkD
A8CUmiDPFCwCfWIjSEblBPp+gzwHU4kxiiR5JMtMyWzGoOAqChbUa7MPD2P6
zjE14zwmfeyW3hRqa44X9HgnxtO0GHYAKEBhE5okUpdpsfBDyiTSrxoN28zV
6mnQwusOqKGIWdPkkOzJkvGthc+TSizSu1vHxQG6oMQC.lCEAPOVDX+47JOw
gVNfOkjJauYnQaXJagN26YgcmxRcIyUk5nLjUYGGzRyXkB0CGN.oPt8hriyh
THi+bPRgzF77IENCUJPysTfGxJhybJEzAhEJZ7bIE4.cuRg4bxKxo9OurS5P
Ym4JSyhTH2gePVsHj4VJFhNh4bpiH2mePqHNlyIuXnVvyM0OeVsFBufNm6lk
aLZXRQyXQVk4W58B4Mat5J8s3dkKmm3ubGW6DP4aweT213lfnktAY2kXwcQ2
vkQd9dgS8qFjdDoKQQHSpHj1Gvz10ddh.lzwMo9sgzn.6zbZTXzYZTzXjOsl
SEl5fMMn8jSEy.jrkEHD3UkPE+v3cbvqJtHqLTZeLjXKaQyq4ohosXxHBjD6
nZW8fDNol+1Z5NVq.sj4GZU9MaouZBCqgfrpdT0oWMPWWwLCtOz8xJJjsYXp
jQ5CVRG7axY.uIB8f2jdJVKWajujZ4XSs7q4vbqo87podN0nLsnyufZv59zR
Y2J+nOnRNjqdOKbmlbjmBKUghkaV6GD3EEDcPpEkyWVnasHMdx66mAvKwTSD
xQ.KWZfLHXaUIQAKqxFz0iAkOHSKSJDK6poMlXZoJ4fMrLjkf0FFd+6Bhn52
.j5.M0kDUgDuqxCSPtyRFFRoL5HNIJNJoHqftzfVz+c7nMItq7yhsCVwv4EY
LoDQq0SqiEaE.pe9XJy7JVLd2NO2tV.XALYLkUPUhisMQAKNPjiitjoEDiqL
SWjJFrmbFUczXnMUslfrIXMl5.gHypCVZBSMtq7CkzWVANaPwHENiKVnvDGS
ip3rf5T40RglHBUIpTKSBQKzXpdhHL0kOtJixjXgLT8.SMDCT+ZQhYP0WV47
tRnO9uhYgfOHO7sOv15uLJX0hRxU+T5hk6VX0CjoUkW6TvlcHPKGGcoC.t57
ZaKnd4xV9Ghtj9IMKz5LXx0yqNuyPtfqlrl4xjTW1LesXRTHduvMntTHJmdd
eFff0HA4Yl2my8C7QKP+72wuRML3HDqx9V3EvDlu2CjpSSTnPHTBrzKrB9.w
Xfpepwl+Sm5aZBYSkJOrpCxRZwVqPZRfJNtjhazs5lNQG+SqF1fG17qh0mx9
jvn+zGeyX2xwzFgf5choXaS0RiEEIPoCIFkTCbZmehrMDTL01PDSaJMqDj1A
8DIchPuegjSp15gj6NQoAU3WWURkzGD09cHLxxVaIStdqdNz1H2+iGVkDsgE
9QE08Ymi2mQ3SDmiZmhaAI1p2Oz1x.oY.hED7jtKhJvLeugawlNk1dcLnDsU
DCZl8TYUVFcvsMyLy1To941sJY0HyukExt0cbz3F7uGMtY5QyFqipYNddXoi
hC2KUs20jojq9863bgr0Ai8PeHbJ.RXVoIWr9ghuggSHg5st9geYfRkXktXy
hCKM8hV5cxDLtKg6.lFJSqc+OyhLgOoDp+YZf+JVxn8DYf1IaN32gN5ZFza0
75SHd8N1pwhU1CUrq0qoTrEwyk3euGOYjVQlOAhwYIeYtnc0HNOold9onU4o
i+zOwZ2cowsI8jNg62DROpwSpzn9zH5RXZxlzgaZNCh1O66wmIysSpXJ+1L4
mbda7AunX1W59XxOJcpZgXmeJDTC4elKtvGcWNVRv.87dwyGy4ihP1VNZGnq
sac6ABLuFN9XzFY.KibQwPDFMAo2Pwz.oBhxrxMj7L5E7+YaOWrzSniSex8V
15njseYrll6lCCuz54DnW4lbyKCkeu0uTcj.cM4JNPiQdhmsbXClEFAHEW.T
wUA4nfmI3bxP8Pzm7yWXxOirdOegdOt8Igo7qY4BBDl8AiLEOJ7XNZMqVuKj
NOOSXuLw8GW9AEZ+xJVFvD6SnudvuOJ5lo6zvF1b4zlsNQTN+vaFczpBgG5T
rSqgMTXQpkaEaDi4PJ29qQP0cixk5+HDFvqr9wAOPqsi80zlNwEU9spf8nRm
i9komPa2OAWFWGS2IQM4FcLYeRlZOO2NKEHckneWnaTwsXyeqBqc4K7cc1HP
pYVFcPoFNjrRDmIc2ust2aW7+iP83dzwceO0VRJ6WXCjJmpgQQxH4S6DtzWP
z.NVDSrgElVc2JoEKzTKVU99Dmt0hGsGlNEW19gk5vTEwQlUZWzXomvMteL4
oVUsigsJV9auZnG5Vob5ax8uUm5seevN1KQi5FNjYkHIKID1aCiV1x0TJfIQ
2ENZIrWaqSnD95GbGu.BwTKc9fPHFFXstCFI7wZ5Ev2lvXGgDpPuhyTCYK8G
a5Et2KrrExcGs34XBss0m1mMTHbJnzlhnVSuL9uYqFs7QJbbCUDUCLaGioV9
9EVPPzccKhGlocVTHzoy8bwV5DMg3fbTdiJFiko0rMAv+YbBnt9h2FEbDTjh
SuvR3WikNmdJuk1jHfw6RhCNNmGHEo.oIxRn+qLRIMCX0iGplPsiQXKnMUoD
fcrrwNS47Jg4w7us6atqINQQrj6mSvIOWASzgnHoEi3BP5kPzXDulB6fzr8O
HYYHtfjSmzcRRYgqRO01AYOH2stWsL7kffD8EFYZCoYYTOAY068Y7znvxci6
yNXoyk6ct7Hv6jQgtXxd+ob15cAA7dufk5ImpikERev11FDC8W6fsP6pqK7R
9stPTc0tvEe4IHPblVPUOo3G0kFcjgpicHNZeTLJzBLvTCpyyOdH+.HOk9dA
tSdZNOLbQxvt2ibDJARU3kFY3mPEN6tCFzGMvv9FRZ4.EQTJT6VpPAvL6t1s
gPCTm4M8O3mve.7iahNJ0Y8mNWK+eB84+w4+elQI9bB
-----------end_max5_patcher-----------
1 Like

to make it even quicker replacement, you could use 13 melbands between 200 and 4k and keep everything else the same in the patch. you get the same number of values so it is a one object replacement (a quick experiment!)

1 Like

Interestingly, with this melbands version of the patch, the numbers I get from kNearestDist are more in the order of 0.001-0.002 range (rather than 13.-20. from the prior version). I guess this has to do with the types of units in the dataset(?).

Ok, so I gave this a spin and it’s not massively different. I seem to get marginally better results with the smaller sample/fft size (256/128) and about identical for the larger one (512/128).

Maybe it generalizes better, but it’s kind of hard to tell. Basically for these tests I run it with small and large pre-recorded files, and then playing. I can get exactly the same results on the fixes files, but doesn’t necessarily translate to better matching on the drum.

I just tested both on more real-world-ly different sounds and both perform really well. I think for the sake of honing in on things, the center to edge difference is probably the smallest difference I’d be trying to train.

Based on this I tried doing the 40bands, but in a smaller frequency range (@minfreq 100 @maxfreq 6000 and @minfreq 200 @maxfreq 4000) and both did much worse in terms of matching. Don’t know if that’s too many bands for that frequency range, or if this isn’t how it’s supposed to work at all.

Perhaps something like this would be useful in terms figuring out what returns the most differentiated clusters?

And as a spitball here, as I don’t think this is how ML shit works, but it would be handy to have a couple of sounds and be able to have a meta-algorithm which you can specify that these two sounds are different, so it would then iterate over descriptors and statistics to find what most accurately captures that difference between the sounds. Rather than manually testing/tweaking (in the dark).

yes. distance mean nothing in themselves, they are dependant on units and normalisation, so you cannot compare between descriptor spaces… hence the importance of LPT’s sanitisation of descriptor space before the distance are calculated!

melbands are a (rough) perceptual model of critical bands. you could go online, find how many critical bands there are in the range you care about and choose that. one way to decide the range you care about could be that you try (in reaper) high and low pass filters on it to see what happens.

something in between is already possible but you will never get a free ride. as ML people say: ‘just add data’

with immense amount of data, and immense amount of descriptors, and immense cpu time, you could do what you want. this is what amazon does. but an in-between is possible with a compromise on you curating and training and tweaking, and many descriptors, and data reduction algo that will remove redundancy in the latter… but no free ride for small batch + time series + low latency + low cpu…

Is there a way to normalize the output relative to the space?

As in, getting a number between 0. and 1. as the output? Just to see what’s matching and why.

I meant more in terms of requesting a high number of melbands for a frequency range that is (potentially) too small (e.g. @numbands 40 @minfreq 1000 @maxfreq 1111). Does that “break” the melband computation or are they just somewhat proportionally distributed in that space?

Obviously to do it well would require lots of time/cpu/data/etc…, but isn’t the principle the same? I meant more in terms of a workflow that rather than sifting through algorithms and parameters, you pick what you want to be classed as different, and it does the best it can with what it has (similar to your autothreshold picker thing, rather than running it over and over hoping to get the right amount, you specific what you want it let the algorithm iterate). At the moment each permutation of this is very time consuming (and again, error prone) to setup, with very little to tell me how effective it has been.

Even given the two versions that work “well” now. As far as I can tell, both work equally well in terms of results (MFCCs with 512/256 and MEL with 256/128), but without being able to see clustering, or have a more objective way of measuring success other than feeding it test audio and saying it found all the notes or not, I don’t know what to do as a next step, or what/how to approach it differently.

Obviously there is no free ride, but at the moment it’s shooting in the dark.

Try fluid.normalize and fluid.standardize on your data. There is an example in LPT and also in the simple learning examples to see their interface. Let us know how you understand them or not.

no, but you might get too much detail to make generalisation. imagine having too much precision on pitch, it won’t help you find trends…

play with it, make music, and that will give you a feel of what works well with each. Then you might have clearer edge cases that help you train and test the mechanism you implement for your specific task.

Maybe @weefuzzy will have more information here, but shooting in the dark is what it’s all about - otherwise you can use tools where someone else have done that experimental training for you, curating the experiments and the results. Either you try to get a feel, or you look at numbers, and you have both possibilities now - you can even look at it with the same paradigm we have in the learning example (json or graphics)

Your next step is to try data sanitisation and normalisation/standardisation and the interaction with the choice of descriptors. There is very little more than learning how they interact since they all fail in various ways.

you could also use the MFCC clustering app we talked about on another thread and see how it classifies your entries chopped at the length you will feed…

Would that be useful here? At the moment, anyways, I’m feeding in the same kind of data for training that I am matching, so presumably they would be scaled “the same”? Or do you mean getting things like MFCCs to behave more reasonably?

Shooting in the dark means different things to you and me though. You (and the rest of the team) know what’s going on with the algorithms, their implementations, intended use cases, etc… Whereas I’m clueless to (most(/all) of) that.

Do you remember who posted it? Or how long ago? There was that language one this week, but I don’t think that’s the one.

Having some kind of visualization of clustering would definitely be useful though, as with that I could more easily figure out if something is working better or not.

I went back and tested permutations of this (as in, having melbands be the main descriptors but with more stats) and I want to say I got some great results at 256/128 with one of the permutations, but in going back to try to figure out which one, I wasn’t able to reproduce it. Perhaps I had a nice sweet spot of amount of training data or something, but as it stands I’m still getting the best results from MFCCs with 512/256 (and more stats).

1 Like

This is a vector of truth. My understanding is building candidly on this website and with the PA learning experiments folder. The other two actually know what they are doing. The dialogue between my ignorance (and also those of you who share questions like you) and their knowledge is the point of this project: trying to find a point where we can discuss and share knowledge and exploration and vision without one person doing all the tasks (coding, exploring, poking, implementing, bending) so we all win in this dialogue. Does it make sense? So the extent of my knowledge is shared here 100%, and it isn’t much, but it keeps our questions focused on what we want. So the playing/poking part is quite important, because it allows you to build a different bias in your knowledge and quest. Otherwise I can send you SciKit Learn tutorials (which I have not done myself for the reason I give you) and you can come at it from a data scientist perspective… an in-between could be Rebecca Fiebrink’s class on Kadenze, which I’ve a part of, but again she offers ways to poke at data.

Now if you want to see your data, you have a problem that we all have: visualising 96D in 2D. see below for some examples on how imperfect that is. Another approach is the clustering you get in our objects for now: it will tell you how many items per cluster and you can try to see which descriptor gives you 2 strong classes.

This thread is about the app I was talking about. This and this threads are about other visualisation possibilities.

Again, as we discussed, if you try more segregated values (further in the space, very different) as your training data, or another approach is to train the ambiguous space. I’ve never tried either but you can now.

1 Like

and again, XKCD has the answer with Friday’s graph:

Yeah that’s a bummer that. I guess with data reduction you (potentially) lose some of how differentiated things are.

Sadly this appears fucked. I downloaded it (AudioStellar) and set it up, and it just sits on 0% initializing or gives me an “error path not found” whenever I try to load samples.

@jamesbradbury’s example isn’t online anymore either.

I created a mini set of samples so there are 20 of each of center and edge hits, all 512 samples long.
tiny samples.zip (44.1 KB)

Hmmm. At the moment I’m getting pretty solid differentiation when I train sounds that are actually different (e.g. center and rim), but for most of the sounds that are actually different, there’s not really a middle ground or ambiguous space as there’s a different surface involved. I did try training hitting the rim with nearer the tip of the stick vs nearer the shoulder, and that worked without any mistakes. I didn’t test to see where that difference started though.

The center to edge, I guess, is the hardest one to tell, in terms of regular snare hits, since they sound the most similar. Perhaps this is not the case, but my thinking is that if I can get those working smoothly, the rest will be a piece of cake.

I plan to rectify this soon as its going to be part of a potential journal submission. Stay tuned…

2 Likes

I recommend reading this paper we published under the indisputable authority of @groma … or just look at the graphs, it’ll tell you what you want to know: indeed you always loose and the various reductions will give you various pros and cons, various affordances, hence proposing in that paper to compare them and see which one fits your needs. As @weefuzzy said, 'It depends’™

I was reminded today of the oldschool NMF-as-KNN approach I was experimenting with early in the process, and came across this post which shows how effective @filterupdate 1 was at refining the filters and selections.

Pre @filterupdate 1:
c33695207779208bcf68c9d3b43094a81ca042c7_2_397x500

Post @filterupdate 1:
16237701609a34a8fc6c56ff514caee296d60960_2_397x500

I’m (fairly) certain this is a naive question, which is probably not possible given the algorithms at play, but is there a way to train some points, and then run arbitrary audio through to have it refine the selections based on the input?

Obviously one can add more points to each classification, but perhaps something like this could help catch or fix the edge cases. (Would be amazing if there was a 2d view where the arbitrary audio that it was fed is mapped, and you could go through and say “this point is A, this point is B” and have the network update accordingly.

Yes you can with machine learning. Soon. You train, add points, train with problem point and/or tweaking parameters. @weefuzzy will confirm but this is what is happening under the hood anyway in nmf, reducing the error by iteratively training… @groma and him keep telling me that nmf is a sort of unsupervised yet seedable classifier… now they will correct me here if I say stupid(er than usual) things

1 Like

Great. I’ll wait on the next release to push this further, or rather, refine this further.

It just seemed like an obvious way to refine the selections without “start over and hope for the best”.

maybe the one after. We are devising packages. stay tuned!

1 Like

I decided to revisit this, as I’ve learned a bunch from the melbands range and smoothing/reduction in this thread, as well as some problematic fft settings in this thread.

After some quick initial testing (took me a bit to figure out the new syntax/messaging) I still get better results from the MFCCs vs the melbands. I did, however, now get good results from the MFCCs when using a smaller analysis window of 256 (using @weefuzzy’s suggestion of @fftsettings 256 64 512). I think I get slightly better matching using a larger analysis window of 512 but the tradeoff in latency isn’t worth it.

So at this point it’s looking like I’m going to be extracting every bit of juice of those 256 samples (“normal” descriptor stuff/stats, 40 melbands, 12mfccs+stats), each for a separate functions later on.

Also wanted to give a bump to this section. I don’t remember any specific mention of the semi-supervised tweaking/updating in the last release(s), but the objects have been refactored, so it’s possible it could have happened and I didn’t notice.

not yet. this is a new series of objects that will replace the quick and efficient KNN with neural nets… slower but cleverer. Stay tuned!