this is exactly what will mess with phase. Here is an example patch that helps you see what you hear (energy smear) - now try with less radical filters this is a 15th order but that should show you that a click, the sharpest and shortest of attacks (your favourite ) will have a time domain behaviour now…
I did suggest that a few times in the past, because it is what I do sometimes… it is very ircam-street team to do preprocessing of the signal, optimising for the task… at least this is in their old patches that I’ve learnt that long, long ago
I’ll check out the phase thing later. Just on a lunch break between Zoomeetings (ugh).
Here’s an audio example if you want to take a stab at it (would love to hear the results and see your thinking):
The left channel is the audio direct from the sensory percussion pickup and the right channel is the Earthworks DM20. Recorded at the same time and with no eq/compression/whatever.
It’s things like flams, press rolls, and faster rolls, with some ‘full kit’ playing at the end for good measure.
I think at the time I was still trying to use their onset detection algorithm anyways, so I was doing their “system” and sending MIDI over and then the onset descriptors was completely separate. So it wasn’t too worth while. The performance I got out of the last test was pretty promising though, hence being back on the audio/audio approach.
the left is super noisy, super high-passed, but just very slightly ahead (I can hear it more than I can see it by the pull in the image)
I’m going to set a threshold that is too permissive, to find where the noise floor is. The idea is to get the noise floor to react (slow env vs fast env) when I start the play (that gives a good idea) - you will notice that it helps it become more nervous - around -50 works for me
then I set the minslice to something not ridiculous (like 2205 aka 20Hz) and I play with the fastrampdown to smooth the tail a bit, in conjunction with on thresh (I have the off thresh about 6dB below to start with) and I get quite good results…
My thinking: top heavy source is what I want. So nervous in, scrap the low end, and surf that peak.
Patch (dumb) and picture of settings (good) attached. Comments and questions welcome.
The sound is indeed noisy and highpassed, with a real high-pitched squeal, which is something I had made a thread about a while back in terms of trying to “correct” for that before doing anything else.
What I normally do is have the vanilla Max version open at the same time and massage those settings as I can see the envelope output of that and make decisions accordingly. I still haven’t gotten a solid understanding of how the fast/slow envelope times have an impact on the peaks.
So with you’re tweaking of the settings (verbose style), are you doing that blind? Just poking at the numbers and… imagining what it’s doing?
edit:
I’m now wondering what other kind of pre-processing I could do with that signal to better improv it for “just” onset detection. Obviously the bulk of it is in the high frequency range, so I may try something where I aggressively highpass and maybe even bandpass to see if that improves the results.
Would looking at something like an expander help out too? Or would that just add latency (and potentially complicate things).
Man, that’s crazy! I’m not up on phase stuff (other than knowing, much like impedance, and whether to use effect or affect, the rule of thumb is “it is always the wrong one”), but I imagine it shouldn’t be doing what it’s doing there, heh.
Ok, after a quick initial test. It seems that a hearty bump at 5k (with a shitty biquad~) improves the tracking. I’ll test it across a bunch of material to see if that holds up, but that could perhaps be useful.
It’s kind of tricky to check for this, so I moved the filtergraph~ around a bunch, cutting the frequency initially, to see where it worked the worst, and then boosted around there and roamed around to see if that helped.
Looking at spectrumdraw~ it looks like there’s a lot of energy in the 4-5k range anyways, so it could be that that’s the resonant peak of the hall sensor or whatever it is that’s going on inside the unit.
I knew we’d get you there This is fantastic! I hope it’ll behave with the kit around though- this is so nervous i’m afraid it’ll be trigger happy, but you might be able to adjust floor dynamically… watch this idea: if floor was taken from your bd or oh mic, which would tell you how loud the non-snare is!
I have the floor set pretty high here, and in the middle of the video I slap the snare around a bit with my hands and no false triggers. For the time being my focus would be on just having a “super snare”, and then seeing how it behaves in a kick.
Although I will do some testing in a bit with a patch where I’m using 3 modular voices driven by the hi-hat contact mic thing I made a while back, this on the snare, and a kick trigger (with the hat and kick being corrected via convolution).
Damn this is super toight. Amazing that this is massive effort over so many months (years?!?!). Just really cool to keep tabs on this thread even though onset detection isn’t something I do much.
Even @Angela, after filming the video, was like “you got your onset detection stuff going finally!”
Oh, and looking through my dirty patch for the video, the click~ you hear in the video is post the onset descriptors stuff, so the latency from the video includes the 512sample delay and associated crunching/processing time of the fluid.bufobjects~(!!).
It’s funny, in going through and trying to put together a patch that triggers that big metal sample library from the performance, I now have a problem where I’m triggering toooo many samples to be intelligible. So I need to find ways to leverage more interesting (and shorter) contours and envelopes to harness this great onset-y power.
Not sure if this is helpful: I used to dynamically change the trigger threshold for piano notes as an offset over the averaged energy of the past 500ms of sound. Therefore, I’m tracking amplitude differences, rather then expecting an absolute level in order to trigger. It was all done in plain Max land. If you are interested, I could dig it out.
Just thought, you might want to adapt your ‘floor’ along those lines.
Hans
@tutschku this is in effect what the algo is doing under the hood. if you set the slow envelop to 22050 that will be your average of the last 500ms at CD sr, and the you can play with the other envelop to make it as fast/slow as you want… the advantage of floor is that it also provides an absolute floor above which you can ignore these difference of thresholds.
Another reason why we ported this in C was that we can do it on buffers, for batch processing, and in real-time in a quite efficient way too. @rodrigo.constanzo has played with my many max versions already, which wetted his appetite since there were a few prototypes that worked pretty well - I’m happy that he is at last happy
Still trying to wrap my head around those parameters.
Here I have a series of piano notes. The slow envelope is very slow and still some of the attacks are getting subdivided.
I don’t have your sample to test, but 5-200 (in samples don’t forget) seem to be on the very nervous side of things. For piano I’d expect slower decay at least, in the ballpark of the minslicelength (which again is optimist - do we really want to have potentially 2 attacks 22ms apart? Maybe but for most application, 10 attacks per seconds is quite a lot… but this is why we give access to all parameters. @rodrigo.constanzo wants that level of nervousness, so now he has it
Inspired by @rodrigo.constanzo 's questions and my own struggle to make sense of those parameters I was poking into the available documentation in search for a ‘verbose’ description of the different slicing processes. You, @tremblap and your team are deeply in that world and take some concepts for granted. They might not be as ‘visible’ to others at this point. As you mentioned many times in past conversations: you are also interested in the gaps of information and the hurdles a user encounters while using the tools. Well, this is one of mine: understanding what those different slicing algorithms are REALLY doing. The documentation mentions slow and fast envelopes, without going any further. If I have missed the location of those explanations, please point me to them.
And how does floor influence all of this? Any sound dropping below it is ignored all together? Is is a type of gate? Where would that gate be in the processing chain?
@tremblap obviously helped a ton in getting my settings fine-tuned, but to get it part of the way there I played with the Max-version of the algorithm to see the impact that the different parts had:
Even with that, that only helps for getting a ballpark of the thresholds and a bit of the envelope speeds, but it was a lost of poking and testing, in the blind.
I’m sure the others (@tremblap, @weefuzzy, @groma) may have a more technical or meaningful response here, but my understanding is that since we’re in dB-land, silence is infinitely quiet, which means the envelope followers get sucked into infinity, and it takes them time to come back up. So @floor caps the bottom of this, in a way that’s immediately useful for detecting repeated attacks coming from silence. More functionally, it is a floor below which onsets won’t get detected at all.
So at the moment, I mainly massage that variable when I’m tuning the onset detection for my snare. The rest, including thresholds, kind of hold put.