Well, I say quick. The video’s still 50 min long Although maybe 25 of those are me saying ‘um’.
We do some audio matching starting from a very minimal setup and then explore a couple of things to improve it. Definitely of the wrap-it-in-gaffer-tape-until-it-works variety.
h/t @tedmoore for pointing out how little scaffolding was really needed to get going
I really like the big button. It is so convenient to click on, and the led is so big, I can be sure I won’t miss it when it flashes.
But seriously, thanks for the great presentation!! I especially liked the max-pooling part, will have to try it out. Honestly, when it got to the flickering-but-mostly-right level, I thought that’s it, that was my earlier experience of trying something similar with MuBu (though I probably did not use it right), and I also thought the relatively silent parts in 1 or 2 or the similar bass register in 2 or 3 will always result in those flickering predictions. I was quite surprised how dead-sure it got with max-pooling.
I also liked that you pointed out the interactive training and playing with the learnrate. Will try that too next time.
(And also, I had no idea about the @carryflag attribute of counter, although I have been annoyed by the default setting since I was a little boy. Now I see my life could have been so happy.)
By the way, what would you think about robustscaling instead of standardising in this context? As far as I understand robustscale scales the values so that there are less “gaps” in the range, but I don’t know if we want that for this purpose or if it would just add noise.
In the general case I think it’s YMMV, and probably worth comparing results. Of course, actually looking at the distribution of values will help make a more informed choice: standardizing assumes that the data distribution is gaussian-enough not to produce nonsense, but outliers will tend to compress the bulk of the scaled data into a unhelpfully small range, and may well fail to centre the data well.
In this particular very dirty case, the samples are so small that the very idea of doing statistics on the data is slightly dodgy anyway: in a sense everything is an outlier! It’s quite possible that the RobustScaler would work better here though: definitely worth a shot. I guess a quick histogram abstraction to be able to inspect the distribution of features could be helpful, although at this scale it’s probably as easy just to swap out the scaler and compare.
Haven’t thought about that either, but it makes sense.
I am working on an abstraction that could hopefully make max-pooling more customizable on the training side (or for the query side for that matter). But not sure if my math checks out. I got inspired by your real-time “no-latency” max pooling on the query side, and I thought that the problem should be similar to an equal-temperament problem. My idea is to have a frame “decay” to X of its original value in exactly Y steps. I think this way we could control how “long” a certain frame should contribute to the result. Does this make sense?
That makes sense, and looks sensible. The way I normally get there is to start from the exponential case and then plug in the desired decay, i.e a coefficient a = e(-1 / N) would decay its input to 1/e over N steps. Then if we had a different target, we’d say a = e(ln(target)/N).
You’ve got a = target(1/N) which might well turn out to be equivalent:
=>
1/N = logtarget(a) = ln(a) / ln(target)
ln(a) = ln(target) / N a = e(ln(target) / N)
I have to digest this, and I might not have enough IQ points to succeed. But feel free to check and modify the abstraction, I am really not that good in math…
Abstraction looks great: convenient and robust, and readable as well. Thanks!
@tremblap pointed out to me earlier today that max pooling probably only makes strict sense for nonnegative data, unless one tries to preserve the signs of the extrema. That looks a bit more fiddly to do in Max, but I might see if I can make a nice complement to this patcher.
Yes I had the feeling too, since now anything will converge to 0. I guess you can offset the values with the detected all-time minimum, and then undo the offset on the output. But that’s hacky.
Can’t it be both? (Yes, some of the of the more looming gaps, I just sped up the patching, and also on some of the training runs (but I actually 'fess up on screen to those))
Just finished watching the video. I didn’t know about max pooling but it seems like you get really improved results although some added latency (shh @rodrigo.constanzo might hear).
Something like a spatial encoder could be of interest to you there is one in the ml.* package. I think that could be a cool way of imposing some form of “memory” and statefulness onto the training process perhaps.
How about this one? fluid_maxpool_v2.zip (4.7 KB)
Now it offsets to detected minimum, and undoes the offset on the output. Seems to work but not 100% sure.
Thanks, James! I actually started out with that one, but then it didn’t feel clear how one would use it with lists.
I tried to recreate Owen’s patch with my fluid.maxpool abstraction. Something still might be a bit off, don’t know… But the maxpooling seems to have a good effect nevertheless. owens_mlpclassifier_with_maxpooling.zip (4.9 KB)
Nice one! I think in the sampling you want to reset the pool when you press the bang, but also wait for the @steps frames before adding the point (which is why I had all that stuff with bline).
I have this dirty PoC for a bipolar max, but I have not tested it in the patch. It has the advantage of keeping the actual values, although once it’ll be all normalised it makes no difference whatsoever
Great video. Thanks for taking the time to walk us through the entire patch making and all the musing about options, advantages and the pooling. Learned a lot!