Pre-processing for (Training for real-time NMF)

So in the course of preparing files and testing for the processes discussed in the Training for real-time NMF thread, I’ve been using some computationally expensive “off-line” iZotope processing on the audio coming from the Sensory Percussion sensors.

The raw audio from these is pretty noisy, with a fair amount of hiss, and fairly high-mid hyped natural EQ sound. I imagine the frequency, and more importantly, transient, response is part of what makes them effective, but the noise is probably problematic.

So in prepping the files for training, I’ve used iZotope on them to denoise, dehum, along with some fairly high hipass-ing.

This works fine, but in terms of doing this in real-time, there’s probably better ways to go about this.

Below are some examples of the audio (before, and my “after”) so you can hear, but is it just as simple as slapping on some EQing? Is it worthwhile making an IR of, what I imagine is fairly consistent “noise” coming from it, and deconvolving that against the live input?

Basically I need to massage this signal some, and I pretty much plan on using it only in a FluCoMa-y (at the moment, NMF-y) context (only for training/matching, never as “audio”). So I figured I could just massage it in the direction that would produce the most “meat” for the initial classification, and then real-time matching algorithm(s).

Obviously the noise/hiss isn’t useful, but I wouldn’t want to throw out too much of the high end, as there’s useful stuff up there.

Here is the unprocessed “raw” audio:
Sensory Percussion (957.1 KB)

Here’s a nice and cleaned “after” audio:
Sensory Percussion (942.9 KB)

I assume the noise is going to register as ‘noise’ if you did the fluid.hpss~ or something clever in FrameLib like a narrowband / wideband filter. You could then just play with levels to get the desired balance which triggers the right transient detection and removing the unwanted component. The deconvolve method sounds fun though

Well that’s the thing. There’s lots of clinical ways to get rid of the noise, many/most computationally expensive (including the deconvolution).

Basically in creating the offline classification data, this doesn’t matter, but in its proper context this audio processing will be running all the time (along with the associated nmf-ing), so I wan’t to figure out the simplest/cheapest way to clean that audio up in a way that is still useful for classification and matching.

Have you found that it drastically affects the classification? To me, it sounds like just high passing it would do enough to make it trivial.

I hadn’t actually thought about it until recently. For the purposes of classification I uber cleaned the samples, but only just realized that that’s probably not the best idea since the live input it will match against isn’t going to get the same treatment.

I was initially going to clamp down on the ends (fairly severe hipass, and milder lowpass), but was mainly wondering if I should keep things in tact at or above a certain frequency range with a 64sample fft.

1 Like

if you are at 44100Hz of Fs that generates a frequency resolution of 689Hz… my understanding of how we lose resolution in the NMF process, since we discard the phase, is quite blurry, and also, don’t forget that what you want is discerning between potential wide range of fundamental so keeping some data in the low-mid would make sense to create different profiles for your dictionary I think. I would not low-pass at all for the same reason.

Out of curiosity, what is your HP cutoff?

In my example above there’s a 48dB/Oct starting at 100Hz, so fairly mild:

My concern with the low-passing is that the signal is super noisy, like tons of hiss and a really high pitched spike (which may be a calibration or “sensor plugged in” detection sound).

Wouldn’t something like that potentially confuse the nmf-ing? Or at least, be a waste of iterations trying to match it?

I think that the nmf should be fed the same signal than the training, and you should optimise the range of the spectra to what is different between your ‘classes’ - so yes, if all classes share the same whine and noise, filtering make sense.

Yeah that’s the idea. I just didn’t want to use something as expensive as iZotope on the audio that would be used in the matching.

So would you say a straight filter? (wide bandpass, maybe with a narrow one on the high pitched squeal)
Or would you go fancier, with something trained on the noise?

I would start by trying easy and cheap computationally (cascade~ with computed filters baked in with filterdesign~ to try to get them to work)

Then i would probably take the noise, and use the HIRT to make an FIR filter, if I was not getting the results I wanted.

1 Like

Cool, I’ll give that a go. I started cobbling some filterdesign stuff together but then decided to wait to hear back from this thread.

A bit of a silly/naive question, but is it possible to “cascade” filterdesign-ed filters? Like can I craft 3 filters with filterdesign (highpass, lowpass, and notch for the squeal) and then sum them together for a single cascade~ object?

yes. just concat the 2 lists of coeffs you get in the ‘cascade’ entry of the filterdesign dict.

1 Like

Wait, as un dict-unpacking the output of each individual filterdesign and concating a really long list of coeffs?

As in

filterdesign @highpass => list of coeffs1
filterdesign @lowpass => list of coeffs2
filterdesign @notch => list of coeffs3


coeffs1 + coeffs2 + coeffs3 => cascade~ coeffs inlet?

not + like adding them item-wise, but just a concat. Try it with 2 simple ones and you’ll see - it works :wink:

or check this amazing example made with so much love:

1 Like

Yeah sorry, that’s what I meant.

Awesome, I’ll give that a play.

Looking at this example I love to think that you wrote the coefs without any help from filterdesign~.

A sort of ‘perfect filter’… :wink:


all these math lessons and that ear training have to give me something :wink:

that would be perfect z transform IIUC, as I do work towards a real ‘perfect filter’ in my EQ training :wink:


Hehe, it doesn’t nearly have enough 1 0 -1 0 1 1 -1 -2 in there, otherwise I would agree!

Ok, so I tested putting together a few filterdesign filters yesterday and the results aren’t great.

The low end (140 highpass 5th order butterworth), and high squeal (20250-21000 bandstop 2nd order butterworth) are easy to take care of, but the overall “noise” is hard to get rid of. Even with a 20th order butterworth at 14k lets lots of noise through (and obviously cuts a lot of the highs).

This is what the noise looks like on its own:

Some 50Hz hum (+ what looks like a ton of harmonics), and that super high squeal, but lots of noise throughout.

So in terms of trying out the HIRT approach, firstly, would it be significantly more expensive than trying to filterdesign a solution? (I can still probably hand-make a bunch of bandstops for the hum + harmonics)

And secondly, which object handles extracting an IR from an audio file? As in, the equivalent of the Learn button in iZotope.

I’ve got the crop -> average -> invert -> minimum phase -> truncate workflow from the 2nd HIRT video, but I’ve only really used the toolbox when dealing with sweeps and physical measurement.

Here’s a sample of the noise on its own: (679.0 KB)

First, a little bump (for @tremblap) about extracting an IR from an audio file, as it would be good to revisit the filtering aspect of this process.

And more excitingly, the new(ly fixed) @filterupdate 1 is having some nice results on my dicts.

Just the raw @ranks:

And some @filterupdate 1 with @iterations 1000:

I still haven’t tested the matching itself (for reasons expressed elsewhere), but this is a promising step.

I also want to test out what @groma suggested in the HPSS thread:

I don’t understand what you try to get? The dicts are filter design already, i.e. the amplitude of each FFT bin. What do you want to do exactly?

this is very good. I will re-run my piano ones soon :wink: