If I wanted to make an MLPRegressor that runs at audio rate, how difficult would that be to modify the current project? I can fake it now by setting the BlockSize to 1, but that isn’t a long-term solution. I’d love to have a .ar solution.
Another approach for me would be to embed the neural network in c++. What might be the best library to use if I were to do my training in flucoma? libtorch? tensorflow?
In principle by no means impossible. It would involve adapting some of the existing C++: making a new UGen based on MLPRegressorQuery that takes audio in / out, deals with input / output vectors instead of buffers, runs inference N times per vector
Then there’d also be some build system hoo-ha to deal with.
(It’d be nice in cases like this if we had a way for these SC objects to easily support both .kr and .ar but we’ve yet to work it out)
How viable would something like that be for Max as well?
Interface-wise, mc.-stuff would be a good fit in Max, though I don’t know enough about SC to know the difference/significance of what ya’ll are suggesting for SC there.
I can see it being super useful, in what I imagine is what @spluta is going for, synth regression stuff. Would be bananas to be able to have network stuff in an (audio rate) feedback loop for extra spicy non-linearity.
About equally viable, if not completely identical. Although, @tedmoore’s suggestion applies equally to Max as well: nn~ maybe could be persuaded to do what you want…
Hmm, I’ll have a further play with it but from the bit of testing I’ve done with it it loves to crash/hang, and I never got anything below 4096 latency with it, regardless of the settings.
That doesn’t sound like the sort of thing that you like! So, per above, one could wrangle a fluid.mlp~ type thing, but there’s still the risk that it would also not be the kind of thing that you like (if nothing else its performance would be contingent on the network size, and it may well be that networks large enough to interest you are just a non-starter at audio rate. Or a stutterer, rather)
Would still be useful though. Would just mean more creative management of network structures as I tend to always go fairly large (based on previous suggestions from you!).
Is it conceivable for the MLP to run in parallel on the CPU or alternatively on the GPU? Or is the cost of gathering the threads or moving the tensor to and from the GPU too great to take advantage of the efficiency?
You might investigate how RAVE does it since they are doing neural synthesis in real-time. I’m pretty sure they’re using convolutional approaches which as an architecture can get large-ish. I don’t know what your use case is–if I were to guess it would be smaller than a RAVE type thing.
Generally, shuffling things on and off the GPU is a bad idea. In the video-realm when I’ve attempted such things, the performance hits are quite salient quite quickly. But then again, that’s going to be a different architecture and bandwidth.
I think it really just depends on how long a prediction takes which depends on architecture size.
Yes and no. I have a fully functioning real-time inference engine using RTNeural:
I also made one using onnx runtime:
The Onnx one is very inefficient, but apparently real-time safe. The nice thing about that one is that it will load any Onnx model and run inference. Pretty cool. The RTNeural one is way more efficient. 2-10x depending on the model. It can load any model saved in RTNeural format, which is unfortunately tensorflow. I have some scripts to convert Pytorch trainings to tensorflow/keras. I basically have fully functional MLP, LSTM, and GRU models working with translation scripts. For RNN trainings on audio I use the Automated-GuitarAmpModeling library and then convert to RTNeural format. I have an “easier to use” A-GAM trainer on my computer that will go up with the next release.
It can do audio rate and control rate inference.
I am going to make a pd external of this and then a Max external.
Things are still a bit messy, but I hopefully will have time to get a cleaner build up there over the break. EDIT:[Just to be clear, the RTNeural one is the one I am focusing on and the one that works better]
Cool! I was curious because I was messing around with nn~ a while ago and liked some of the things it could do. Once I get some more time it’s something I want to dig into deeper.
One of my discoveries with nn~ is that it uses torchlib, which is not real-time safe. That might be one of the reasons why it has audio issues. My first versions of this used torchlib. It would just fire up all the CPUs and run at 1000% with the same model that had RTNeural at <10%.
Basically, we are at a crossroads, I think, where some industry solution like Onnx or Modular Max will be both ubiquitous and fast/safe enough for audio. RTNeural is the current solution because of efficiency, but the lack of Pytorch support will be an issue. Onnx is awesome because you can export Pytorch straight to Onnx.
The hard part here was the infrastructure. Both of my plugins are basically the same. Adopting to a new inference engine would be easy.
I’m curious here - FluidMLPRegressor and eventually the FluidLSTMRegressor (or the Forecaster that @lewardo will generalise eventually to Series to Series) - my thinking would be to train on other platforms (like @tedmoore did with scikitlearn for the mlp) and import the model. Running them at AR is not that expensive…
now I don’t have plans for other network technologies (stochastic and/or convolutional for instance), and we’re more in control-land here considering the toolset agenda, but hey, I’m still curious to see if we can make it hackable…