Audio rate MLPRegressor

tedmoore · April 10, 2025, 10:52pm

awwwwww.

rodrigo.constanzo · April 24, 2025, 8:46pm

Just thinking out loud some use cases to test this out with, and one that comes to mind was trying to do a low (zero?) latency NMF-style decomposition.

For the snare/friction/feedback stuff I’ve been doing, I started experimenting with having some 3-component NMF filtering and switching between them quickly. Don’t have a video of it with snare stuff, but basically this effect, but on friction/noise snare instead of sax:

(timestamp 50:25)

This works and sounds good, but the lowest fft size that I can do with the NMF sounding even remotely ok is @fftsettings 512 64 512, which is fairly small as far as fft sizes go, but gigantic when it comes to the shit I’m doing, and feedback-based stuff in particular.

So in watching your vid I see that you load models where there is mc. input for parameter control, so what I was thinking is training something up where when the second signal is 0 the target is NMF component 1, then when the second signal is 0.5, the target is NMF component 2, and when the second signal is 2, the target is NMF component 3.

Presumably this would also create a regressor where I could then interpolate between the components (which is cool in and of itself), but what I’m unsure about is the training part. Unless I overlooked it (it’s possible, as I wasn’t following super closely to the long python arguments sections), you showed how to train up static models (sine->saw and clean->distorted) (though you did show how to convert models found).

So I guess I have a few of questions.

Does this sound like a plausible use case? (effectively a dynamic complex filter)
Would the latency for the processing be zero samples (sample in sample out)?
Is it possible to train something like this where you have multichannel audio files?
Have you created your own quirky effects this way?
Have you experimented doing that kind of varispeed thing?

rodrigo.constanzo · April 25, 2025, 7:43am

I guess the idea would be to have a 2 channel audio file as the input, and the audio I’m training on would loop 3 times (in my case) where the first time the second channel is filled with 0, 0.5, and 1 for each loop. And the target audio file would also loop 3 times but would have each component of the NMF at the corresponding time slots.

Is that the idea for how that would/should work?

spluta · April 25, 2025, 2:03pm

I have no idea if this would sound right, but it might be able to work. You would need to do an LSTM or GRU for this.

The thing to do first would be to train this on one channel of NMF to see if it is possible for the model to emulate that circuit. You want to follow the tutorial at 1:45:00 - Training your Own Distortion Models. Use a clean input as your input.wav and one channel of the NMF as your target.wav. I would probably use and LSTM or GRU with a hidden size between 20 and 60.

If this works, then you would train a model on two inputs, where the first input is the clean signal and the second input is a “knob” that allows you to scroll between different targets. Right now, my script doesn’t do this, so you would have to use the guitarML version of the Automated Guitar Amp Modeling repository, where he achieves this. There is a video here on how to do it with a gain knob:

Try the 1 NMF thing first to see if the more complicated training is worth your time. I’d love to know if it works

EDIT: I would train it with the skip connection off.

Sam

rodrigo.constanzo · April 27, 2025, 10:14am

That’s a much more sensible idea…

I guess for the kind of material I’m doing, it’s closer to a static filter (the sounding results), so I would imagine it should be able to capture that somewhat well.

Towards that end, do you reckon running some of that sweep/noise stuff through it (in addition to the actual snare sounds) to maximize the chances for it to converge?

I suppose the idea is, in order for it to generalize properly, would be to cover a significant range of material in an efficient and compact way.

spluta · May 6, 2025, 1:11am

Sorry. I missed your last post. Yeah. Most of the training examples I have seen have a couple different white noise bursts at the beginning and some oscillator sweeps. The guitar distortions tend to have mostly guitar as the source, but they translate well to other things. Ideally they would have all kinds of sounds.

Not sure how much you messed with this, but I imagine you would want to get the NMF filters then send the sounds through those before training…but that brings up the question of why you wouldn’t just do that in real-time instead.

spluta · May 6, 2025, 1:30am

OK, so I have finally gotten the RTNeural object into a “beta” state. For some reason, pd on windows isn’t happy, but otherwise things are working. Hopefully I didn’t introduce a giant bug at the last second, haha.

Anyhow, beta is here:

If you find a bug, feel free to post a question here, but you could also just report it on GitHub.

The latest change hopefully allows a decent approach to LSTM note/dur prediction. It allows you to send lists of N samples (the network is trained on N samples of history) and it will predict the next note or duration or whatever, outputting a softmax that represents the weighted probabilities of all possible outcomes in the training.

I made two videos, one in Max and one in SC, showing this:

The training is not super complicated. I basically just am implementing the " LSTM with Variable-Length Input to One-Char Output" from here:

https://machinelearningmastery.com/understanding-stateful-lstm-recurrent-neural-networks-python-keras/

I am sure there are better approaches, but this shows that the external works and can be trained to predict notes and rhythms.

Sam