Audio rate MLPRegressor

rodrigo.constanzo · December 24, 2024, 9:33pm

Amazing!

So what kind of input/output will your Max thing do? Buffers?

I was wondering, after posting this, if audio writing/reading can happen at audio rate, with the right threading.

Working on a project with a buddy for the upcoming NIME about using a genetic algorithm to crawl a synth space, then have a “timbre analogy” created from realtime input (basically a slightly more lofi but more generalizable version of this approach using DDSP).

As part of that he’s built a thing where he can train up a regressor in Python and then dump out a .json that can be loaded into the FluCoMa stuff, with the structure/activations all matching.

Once that’s up and running will share that as well.

spluta · December 26, 2024, 5:06pm

Audio or a list. It is designed for DSP processing so the MSP object will output multichannel audio. The Max object will output a list. In pd you snake the audio together then unsnake the output. Hopefully that will be as easy in Max.

I like a triggerable audio object though. I hadn’t thought of that. That would still send audio as its output, but it will only run inference when triggered. So you could get a changing output faster than the control rate, but not have to do thousands of calculations every sample. Seems reasonable. Is that what you are imagining?

Sam

rodrigo.constanzo · December 27, 2024, 12:17am

Yeah exactly.

The use case I’m working on just needs to happen once per onset, so it would be ideal to have the whole signal path stay in the signal domain for speed/timing.

Though full-blown audio rate would be great.

MC audio is a decent way to handle that I think (in Max).

Independent of the technical-side of things here, curious what kind of musical results and/or funky algorithms you’ve been able to cook up with this, if you have a version working in your setup already.

I can picture some crazy feedback processes where there’s NNs in the signal path.

weefuzzy · December 29, 2024, 11:13am

The shape isn’t unusual, but you’d need to do some wrangling to write / read the weights and biases into a json file of the expected format, which shouldn’t be too hard to do in python.

tedmoore · December 30, 2024, 5:00am

It’s an sklearn-to-flucoma-mlp script but it might be a useful reference for formatting the data for the flucoma json. also if a pytorch-to-flucoma gets made, it might be nice to fold this into some Python Scripts for FluCoMa package. I have some python-NMF ones I use when teaching but are a good extension of using the flucoma nmf stuff.

github.com

tedmoore/FluCoMa-stuff/blob/master/sklearn_mlp_to_fluid_mlp.py

from sklearn.neural_network import MLPRegressor

def mlpToFluidJsonDict(mlp):
    '''
    "layers":{ // layers are in feedforward order
        "activation": // int from FluidMLPRegressor //,
        "biases": biases for the layer (array of "cols" length)
        "cols": int num of neurons in this layer ,
        "rows": num inputs to this layer,
        "weights: array of "rows" arrays with "cols" items in each,
    }
    '''
    weights = mlp.coefs_
    biases = mlp.intercepts_
    json_dict = {"layers":[{} for _ in range(mlp.n_layers_ - 1)]}
    
    activation_map = {'identity':0,  'logistic':1,  'relu':2, 'tanh':3}
    acvtivation = activation_map[mlp.activation]

    for i, biases_array in enumerate(biases):

This file has been truncated. show original

woodwindblues · March 7, 2025, 7:28am

@spluta Your project looks very interesting! Do you already have any resources on how to train models for RTNeural for timbre transfer (similar to rave)? I dug through the github repo, but could only find examples for control rate training.

spluta · March 10, 2025, 1:16am

Thanks for your interest in my project. I just got most builds made for SC, pd, and max and uploaded to the repo. The only one that I haven’t cracked yet is the Max Windows build, which gets a strange error deep in the RTNeural source code. Anyhow, the builds are here under the Releases. (These include Rod’s requested feature of being able to run at audio rate, but only run inference when receiving and audio rate impulse.)

RTNeural_Plugin is more of a general purpose inference engine, so it really comes down to whether you can get your nn into a state that can be loaded by Jatin’s library. The list of available layers are here..

I don’t know the exact shape of Rave models. I know they are autoencoders, but the devil is certainly in the details.

The future goal for this plugin would be to load any pytorch training. The problem is that libtorch is not real-time safe (which is why the Rave stuff crackles). ONNX is not particularly efficient. Maybe a new library will come along that is both of those things, and that would be relatively easy to implement from where I am at.

Sam

jamesbradbury · March 12, 2025, 7:17am

This is interesting stuff. So you are training your models presumably with keras and tensorflow to use with this? Or have you cracked the pytorch problem by this point?

spluta · March 12, 2025, 1:24pm

The Windows build for Max now works. I had to build with MinGW, not MSVC.

No, actually, I need to document this better, but to see what is going on you need to download the RTNeural_python directory from the release builds. This is where all the trainings happen. There is a README in the root directory on how to set up the virtual environment and then READMEs in the subdirectories on how to do specific trainings. This is by no means exhaustive. The plugin can do way more than what I am showing how to train for.

In most cases I am training in pytorch and converting the trainings to the keras format that RTNeural wants. Trainings should save a .pt file as well as the _RTNeural.json in case their is a future where .pt files can be loaded directly.

So, most of the training is in pytorch.

Sam

jamesbradbury · March 12, 2025, 3:33pm

Yeah okay! That sounds pretty manageable. I did see in the original source mention of these JSON files, so that all makes sense.

jamesbradbury · March 12, 2025, 4:05pm

Might also be interesting to you. People from TU berlin

spluta · March 18, 2025, 10:05pm

Hmm. Might be easy to integrate their code. The interface with the RTNeural inference engine is barely 200 lines of code in my plugin!! That was the easy part. I’ll give it a shot at some point. There are good reasons to use LibTorch and ONNX and reasons not to. The reasons not to are that LibTorch is not real-time safe and neither are anywhere near efficient enough to use in a real-time setting. But having them available would be good because they would just load any .pt or .onnx file without having to convert to RTNeural format.

Sam

rodrigo.constanzo · April 7, 2025, 11:32pm

Really enjoyed this video!

Lots of cool possibilities!

Would love to, eventually, see something like this built into FluCoMa natively (I know @balintlaczko has also expressed some desire for a “load your own model” network paradigm), as well as the option for audio-rate stuff too. (The DSP possibilities and examples are pretty exciting!).

Mainly thinking of being able to use the same interface/plumbing for inference.

spluta · April 8, 2025, 3:01pm

Thanks for posting that Rod. The reason I made this was because of Owen’s initial skepticism around making audio rate stuff using the FluCoMa framework. In particular the buffer interface would be a hurdle. Jatin’s RTNeural framework, however, is designed for audio. I didn’t intent to make something that would also run at control rate, but it just kind of worked, so I included it.

I wanted to put this in the LSTM thread, but I didn’t want to soil two threads with my own stuff. (And I promise not to keep pointing towards my project after this post. I realize this is the FluCoMa forum, not the Sam’s project forum.)

LSTM Training:

I have found the best repository for LSTM training to be AutomatedGuitarAmpModeling by Alec Wright, which is written using PyTorch. It is used in Proteus and Aida-x and ChowDSP things. I have found it doesn’t just train guitar amps. It works well with time series things as well. I made an example attached below, which will does time series prediction. I think it could be tweaked to do more complex pitch or word prediction. If not, such scripts certainly exist on GitHub. The python scripts used are included in my repository.

<pre><code>
----------begin_max5_patcher----------
2226.3oc0bk0aiiiD94jeEDB6BLAvws3gtBvhE8SKlE6LOLy71lEFz1z1pit
VIpbzCl429vCotc5lxlRVVwIAsUrMkK9UGerphz8ue8UNKyelU4.tC7eAWc0
ue8UWodI4KbUyyuxIk97pDZkZXNaRxypScloeq30pWLe4mtE5hae0M4koTt7
c7adEwsjWySXb0GBp4UKn7U6hy1tnjshqmD9Aj4ty.99P4EOW4iHzbWv+q4d
zeL7WJX5avwYFvYIMaqyWFRAsjlx3rxErL5xD0.c+5DINqcd.ku1eb80xGlY
I9Wkmlxx3sPkydVMycBtAvKi2tkUB36XfRVUcBuBbq5YahKq3fMrm.owa2wA
KYfmJyy1BnUp2OMeMKAjwXqEOOGbuySzxTPcw8NsxIINisJuNi+J8mPJh4Bk
Gmms3UifXx9fbLZMb61ZDfBjF.r1nf7UWv38LGmM8o+Mfjb5Zk5gWRiyDyLi
dcvdipHn6bOgyk16BhvJv4NEnhr7FPE8QAjnkaY74OQez.nhh5MlfheTv.6q
.USjyz.JZCnhyJp6DSg826KJ7sBRd2.9k5rW47A9TsHFNI9AF3GAqyEfU81O
FulkC9gcbdQ0ce3CuHfV87krO3Vj+ueAGV+7O9v+bCixqKY+ipczR15a.TN.
dG16Nb38Y2m8TLem5iZcbEeghJPpDxDtGEu.pVUFWvkiKkJDck3yAjuQN9JA
uRBca0cfaK1.947LF31pU.Wvs0bv+4W+seBb6tJg9Rduh+PvMjrVx7PydgK0
zyUBcolyJlChqlAJJyWJnLeogURHK5pU0kTNS9orJO6QVI+0pEAe0u7a+LSL
nDfl3GnPzG+We7mVHmFK34KZGg.Q2mYjSC5YvmIn29L3HkyBow0AA+xkyuSC
VDFvk5Lo9oh8+qYYqXRJdpfJKun0iYY8lMrxp4Rsk3s3woxQWFypDpe153UR
57YfuZtUpaEeA.JBzRKRDep6XBxQsiPSbm3Oiq.rL48HWVgx2+Natuc4Iq0q
4jIl1sBjsF7HMol0NE2aNIs6+fPgEmVmBph+rXLBmJjqm6MFsidGcsIHzfk1
q+LdAQZZAk8NTY1kqSc9synazFG5WLxl36H8FQDO0JQ9tSNcGrAQ64dZBR39
Sg6EpACdxwT0NEgWpbRsWffIb0+THBcUHIvEd1vUF6Iwj66fUoHLZMvcNx.P
B6HuA3ALP3v8y2Nxh7sMCQz3AwB5pG.v4l.XX+AHQYa7BzqKDLX.NDaXmEKE
5MB0J0X6Bz9hWf0J0M7GiRECgJioG4RsTwtbuYrG9ytYiBI81GuUS3pWDz+3
ZBgkQp0M5niGUKMZ7bzemZoUYmYP0DDNT15VCMb5MzorpJ5V12AUYs55ZWTI
OtPm73hkLQIMeI4+OUkmYRQDzeOdcdEMr5HR3jtt0xZN2HR7GxBvJqHQGBiH
sO1MPlP+5tr1tlrht8G69XM10luHYWYdqS9PUlBqzD.6HMQxwMt5T58sHdMV
lIql+R9O4SMCYuQLkRdlt7cH.ZxoFO3rJ0ke6GMoIc0czIp2.AgUgkQjKvfy
NrlbvRPhoNkh6fl8.K7BIXcGeUEYiaZG2d3eSdFW1T.EFji8PpkY6alk2YlP
4nleerLll3btYs9aPvyOaTyz+Uhg9HkJwUyf4NXMSu0HnQTifMnM7BFP3tRI
f7rN8ryNM9eWTMtIvAGpkFEniA7siC+rCvOm.R46LAQ2ADlC2aadHP6xudDw
Xm9m.O.BHeDKt5J9UdEotJ+0U7Nh26fMhbOUCo+w43lP6f87skIc91lmIwu2
.g3pVEVm276iUxJoYqySA9lT.3gp.fDxkSmtp+brYdJRGMA.eb74hrMHtcQ4
1qSDu07sk40EcfaTuIuZwcn5he3EC4Uylgop.GapcsCHsinP8hQZX17joxId
3.s+oT3G3t2QN3BAnOUFyMsWO39SGCca.XSp0jKa.5MfZ6Quc3qyjBItheLA
v9mWHxOPWauJiPuKk7BkMhSSn+cfDM.9llMtKZ+jGP1m1vYqVNYoUwYxM0t5
0F0t22YjAMRzP0HM90sm5l2XGa8d1dfMU.2+hBZ.JVu6Rgt11q4YitOPGf9V
icqB4O3HYWOaaC4jDIWj+.6.lTT+OxDHh9vRDEzw9DcxaOPmUrf5+tZE.aNf
Gv2OkrbH12928QeL5cF6693ENT7dYxs1013AC6M0ZCNwP0EcqEtnnV0TOch3
9er7fjn8YdfiOyyA5Ft3WSnvu2kXC07m5vQBp+UXOQgmlBGg8OEdTfl9g38d
X64f3du8bHcy8fgAWxaOW0JZBS0pyagyMetnf8eskld91zz9vgetn7G21g0Y
EZCMZs4h+adCw5N2nAX77l7l4ptaUK1+luuPp4m70eM1qxqKW0J+18QF7043
ZVEONSUm1dCR8sHZuQkWtVDs2I+nsxVl94Qksr9XSh18jDs7PYdTQKOOfxAc
RRxJE72LcFIPJOaDGWzjyh9MzBQ2pZNMIgrAjgigjrwcMbLvTfMZuVfe9cQB
BOGQ+xCX1z3hHODTGGjRMA5TC1irURmLshMt8vwP6I6cyQkDdLvD1FmQYGV.
vSURSkxCZkxaDDjGzJHMF7fpyrjzDbPYIOwCmrcRKJWaD0oipPqzfAifrHgS
Ep7rg5ChNGo.Xkn8dsqzHszhJpCcDQSFiXAqXg8GKhjiEyA8Fk.AaraPytLn
SKtvF0oxtcpDKDa3kIjwHV2lrFIiwR.DaLbDzT4h7M4MLRrJXqVPcLnNsZ8f
QQRHqjzYoNXqVeENJLk1Da+MSmwBjdSFHitn7YPQmiB2fHaSSezsjVIZXvYA
0XaE8nmNCBNY9uV0QNzYoYfV4UAOKlVOaKG708GP27UZQwirxplQqDhSJ8S4
pIV3L0SiyzOU8krzoj8Xb63UGHZGZ4pcwb1J4+gjn5.8y95sl2Q9s2qLqNtA
kB3IDop80xuUBUETMRTc495+35+RyZIzB
-----------end_max5_patcher-----------
</code></pre>

rodrigo.constanzo · April 8, 2025, 4:55pm

Definitely couldn’t work without reconsidering the interface, but your mc. solution seems quite elegant all things considered. Can get gnarly at higher dimensionality, but nothing too crazy.

The pre/post processing does get a bit more complex as you then have to scale/normalize/inversetransform/etc… on signals, rather than buffers. I suppose this could either keeping the existing paradigm where each step is a separate object (e.g. mc.pack~ 4 → fluid.mcnormalize~ → fluid.rtneural~ 4 13 → fluid.mcnormalize~ → mc.unpack~ 13) or just have more of it baked into a potential fluid.rtneural~ where you can specify models as well as input/output processing pipelines (e.g. mc.pack~ 4 → fluid.rtneural~ 4 13 → mc.unpack~ 13).

The former would also have knock-on effects of being really cool mc.~ processing.

tremblap · April 8, 2025, 5:21pm

for me, it is the forum where we share techno-musical investigations (or musico-techno-musings) in this sort of things, so feel free to continue contributing

tedmoore · April 8, 2025, 10:34pm

agreed

weefuzzy · April 9, 2025, 4:58pm

I feel almost guilty! Anyway, to reinforce the others, please keep posting. This is good.

tremblap · April 9, 2025, 5:04pm

that emoji rocks!

spluta · April 9, 2025, 6:35pm

I blame you for my learning so much.

Sam