Audio rate MLPRegressor

spluta · June 28, 2024, 2:29pm

Two related questions:

If I wanted to make an MLPRegressor that runs at audio rate, how difficult would that be to modify the current project? I can fake it now by setting the BlockSize to 1, but that isn’t a long-term solution. I’d love to have a .ar solution.

Another approach for me would be to embed the neural network in c++. What might be the best library to use if I were to do my training in flucoma? libtorch? tensorflow?

Sam

weefuzzy · June 28, 2024, 6:07pm

In principle by no means impossible. It would involve adapting some of the existing C++: making a new UGen based on MLPRegressorQuery that takes audio in / out, deals with input / output vectors instead of buffers, runs inference N times per vector

github.com

flucoma/flucoma-core/blob/main/include/clients/nrt/MLPRegressorClient.hpp#L259


      
          
          using MLPRegressorRef = SharedClientRef<const MLPRegressorClient>;
          
          constexpr auto MLPRegressorQueryParams =
              defineParameters(MLPRegressorRef::makeParam("model", "Source Model"),
                               LongParam("tapIn", "Input Tap Index", 0, Min(0)),
                               LongParam("tapOut", "Output Tap Index", -1, Min(-1)),
                               InputBufferParam("inputPointBuffer", "Input Point Buffer"),
                               BufferParam("predictionBuffer", "Prediction Buffer"));
          
          class MLPRegressorQuery : public FluidBaseClient, ControlIn, ControlOut
          {
            enum { kModel, kInputTap, kOutputTap, kInputBuffer, kOutputBuffer };
          
          public:
            using ParamDescType = decltype(MLPRegressorQueryParams);
          
            using ParamSetViewType = ParameterSetView<ParamDescType>;
            std::reference_wrapper<ParamSetViewType> mParams;
          
            void setParams(ParamSetViewType& p) { mParams = p; }

Then there’d also be some build system hoo-ha to deal with.

(It’d be nice in cases like this if we had a way for these SC objects to easily support both .kr and .ar but we’ve yet to work it out)

As far as ‘embedding’ goes, if you wanted something less connected to all our plumbing, you could still use our underlying MLP implementation (flucoma-core/include/algorithms/public/MLP.hpp at main · flucoma/flucoma-core · GitHub) and work out your own way of getting the trained weights in there.

spluta · June 29, 2024, 1:31am

Awesome. This is really helpful. The latter approach seems correct…or at least easier. I’ll come back with questions if I have them.

Thanks,

Sam

rodrigo.constanzo · June 29, 2024, 12:00pm

How viable would something like that be for Max as well?

Interface-wise, mc.-stuff would be a good fit in Max, though I don’t know enough about SC to know the difference/significance of what ya’ll are suggesting for SC there.

I can see it being super useful, in what I imagine is what @spluta is going for, synth regression stuff. Would be bananas to be able to have network stuff in an (audio rate) feedback loop for extra spicy non-linearity.

tedmoore · June 29, 2024, 6:23pm

I’m guessing you’ve checked out GitHub - elgiano/nn.ar: nn_tilde adaptation for SuperCollider ? I know he put it together because of RAVE but I think it is designed to load any torch models so maybe will have some useful plumbing or ideas.

spluta · June 29, 2024, 6:43pm

Thanks. That might have saved me two weeks of work, haha.

weefuzzy · June 30, 2024, 2:27pm

About equally viable, if not completely identical. Although, @tedmoore’s suggestion applies equally to Max as well: nn~ maybe could be persuaded to do what you want…

rodrigo.constanzo · June 30, 2024, 2:39pm

Hmm, I’ll have a further play with it but from the bit of testing I’ve done with it it loves to crash/hang, and I never got anything below 4096 latency with it, regardless of the settings.

weefuzzy · June 30, 2024, 2:52pm

That doesn’t sound like the sort of thing that you like! So, per above, one could wrangle a fluid.mlp~ type thing, but there’s still the risk that it would also not be the kind of thing that you like (if nothing else its performance would be contingent on the network size, and it may well be that networks large enough to interest you are just a non-starter at audio rate. Or a stutterer, rather)

rodrigo.constanzo · June 30, 2024, 3:10pm

Would still be useful though. Would just mean more creative management of network structures as I tend to always go fairly large (based on previous suggestions from you!).

spluta · June 30, 2024, 5:29pm

Is it conceivable for the MLP to run in parallel on the CPU or alternatively on the GPU? Or is the cost of gathering the threads or moving the tensor to and from the GPU too great to take advantage of the efficiency?

edit: in the audio thread

Sam

tedmoore · June 30, 2024, 7:09pm

You might investigate how RAVE does it since they are doing neural synthesis in real-time. I’m pretty sure they’re using convolutional approaches which as an architecture can get large-ish. I don’t know what your use case is–if I were to guess it would be smaller than a RAVE type thing.

Generally, shuffling things on and off the GPU is a bad idea. In the video-realm when I’ve attempted such things, the performance hits are quite salient quite quickly. But then again, that’s going to be a different architecture and bandwidth.

I think it really just depends on how long a prediction takes which depends on architecture size.

weefuzzy · June 30, 2024, 8:17pm

Looking at nn.ar, it’s using libtorch running on the cpu as far as I can tell

jamesbradbury · November 5, 2024, 1:26pm

@spluta did you go further down the route of fluid.mlp~?

spluta · November 5, 2024, 3:19pm

Yes and no. I have a fully functioning real-time inference engine using RTNeural:

I also made one using onnx runtime:

The Onnx one is very inefficient, but apparently real-time safe. The nice thing about that one is that it will load any Onnx model and run inference. Pretty cool. The RTNeural one is way more efficient. 2-10x depending on the model. It can load any model saved in RTNeural format, which is unfortunately tensorflow. I have some scripts to convert Pytorch trainings to tensorflow/keras. I basically have fully functional MLP, LSTM, and GRU models working with translation scripts. For RNN trainings on audio I use the Automated-GuitarAmpModeling library and then convert to RTNeural format. I have an “easier to use” A-GAM trainer on my computer that will go up with the next release.

It can do audio rate and control rate inference.

I am going to make a pd external of this and then a Max external.

Things are still a bit messy, but I hopefully will have time to get a cleaner build up there over the break. EDIT:[Just to be clear, the RTNeural one is the one I am focusing on and the one that works better]

Sam

jamesbradbury · November 5, 2024, 4:25pm

Cool! I was curious because I was messing around with nn~ a while ago and liked some of the things it could do. Once I get some more time it’s something I want to dig into deeper.

spluta · November 5, 2024, 4:39pm

One of my discoveries with nn~ is that it uses torchlib, which is not real-time safe. That might be one of the reasons why it has audio issues. My first versions of this used torchlib. It would just fire up all the CPUs and run at 1000% with the same model that had RTNeural at <10%.

Basically, we are at a crossroads, I think, where some industry solution like Onnx or Modular Max will be both ubiquitous and fast/safe enough for audio. RTNeural is the current solution because of efficiency, but the lack of Pytorch support will be an issue. Onnx is awesome because you can export Pytorch straight to Onnx.

The hard part here was the infrastructure. Both of my plugins are basically the same. Adopting to a new inference engine would be easy.

Sam

tremblap · November 6, 2024, 1:17pm

I’m curious here - FluidMLPRegressor and eventually the FluidLSTMRegressor (or the Forecaster that @lewardo will generalise eventually to Series to Series) - my thinking would be to train on other platforms (like @tedmoore did with scikitlearn for the mlp) and import the model. Running them at AR is not that expensive…

now I don’t have plans for other network technologies (stochastic and/or convolutional for instance), and we’re more in control-land here considering the toolset agenda, but hey, I’m still curious to see if we can make it hackable…

rodrigo.constanzo · December 24, 2024, 10:28am

A little mini FR for the regressors.

Obviously full audio rate would be amazing (as mentioned above), but I do find myself sometimes wanting to trigger the regressor from an audio-rate process. At present I have to use edge~ or snapshot~ to come out of signal-rate to then trigger the predictpoint function, but it would be amazing to be able to add a @predictpointbuffer attribute to fluid.mlpregressor~ and then when a trigger arrives at its input, it processes that like it normally does (or ideally within a single sample, if possible).

But the main idea would be able to able to keep the processing within the signal-rate domain (even if it isn’t processed that way).

Like so:

(though having a signal outlet would perhaps mean it needs to be a different object altogether (fluid.mlpregressor~~, with an extra ~ to signify that it’s the signal-rate version…))

spluta · December 24, 2024, 8:15pm

I know this is the FluCoMa forum, but I should have a Max version of RTNeural working in the next couple of weeks (pd is fully operational). It does audio rate MLP. I hadn’t thought of this trigger approach, but I really like it. Shouldn’t be hard to implement. If the mods are OK with it, I will post here when it works and I will try to add this trigger feature.

It would be awesome to figure out how to make trainings that could work with both systems. RTNeural shapes are all Tensorflow Keras. Are the FluCoMa networks a standard shape or are they proprietary?

Sam