Audio rate MLPRegressor

Two related questions:

If I wanted to make an MLPRegressor that runs at audio rate, how difficult would that be to modify the current project? I can fake it now by setting the BlockSize to 1, but that isn’t a long-term solution. I’d love to have a .ar solution.

Another approach for me would be to embed the neural network in c++. What might be the best library to use if I were to do my training in flucoma? libtorch? tensorflow?

Sam

2 Likes

In principle by no means impossible. It would involve adapting some of the existing C++: making a new UGen based on MLPRegressorQuery that takes audio in / out, deals with input / output vectors instead of buffers, runs inference N times per vector

Then there’d also be some build system hoo-ha to deal with.

(It’d be nice in cases like this if we had a way for these SC objects to easily support both .kr and .ar but we’ve yet to work it out)

As far as ‘embedding’ goes, if you wanted something less connected to all our plumbing, you could still use our underlying MLP implementation (flucoma-core/include/algorithms/public/MLP.hpp at main · flucoma/flucoma-core · GitHub) and work out your own way of getting the trained weights in there.

2 Likes

Awesome. This is really helpful. The latter approach seems correct…or at least easier. I’ll come back with questions if I have them.

Thanks,

Sam

1 Like

How viable would something like that be for Max as well?

Interface-wise, mc.-stuff would be a good fit in Max, though I don’t know enough about SC to know the difference/significance of what ya’ll are suggesting for SC there.

I can see it being super useful, in what I imagine is what @spluta is going for, synth regression stuff. Would be bananas to be able to have network stuff in an (audio rate) feedback loop for extra spicy non-linearity.

1 Like

I’m guessing you’ve checked out GitHub - elgiano/nn.ar: nn_tilde adaptation for SuperCollider ? I know he put it together because of RAVE but I think it is designed to load any torch models so maybe will have some useful plumbing or ideas.

Thanks. That might have saved me two weeks of work, haha.

About equally viable, if not completely identical. Although, @tedmoore’s suggestion applies equally to Max as well: nn~ maybe could be persuaded to do what you want…

Hmm, I’ll have a further play with it but from the bit of testing I’ve done with it it loves to crash/hang, and I never got anything below 4096 latency with it, regardless of the settings.

That doesn’t sound like the sort of thing that you like! So, per above, one could wrangle a fluid.mlp~ type thing, but there’s still the risk that it would also not be the kind of thing that you like (if nothing else its performance would be contingent on the network size, and it may well be that networks large enough to interest you are just a non-starter at audio rate. Or a stutterer, rather)

1 Like

Would still be useful though. Would just mean more creative management of network structures as I tend to always go fairly large (based on previous suggestions from you!).

Is it conceivable for the MLP to run in parallel on the CPU or alternatively on the GPU? Or is the cost of gathering the threads or moving the tensor to and from the GPU too great to take advantage of the efficiency?

edit: in the audio thread

Sam

You might investigate how RAVE does it since they are doing neural synthesis in real-time. I’m pretty sure they’re using convolutional approaches which as an architecture can get large-ish. I don’t know what your use case is–if I were to guess it would be smaller than a RAVE type thing.

Generally, shuffling things on and off the GPU is a bad idea. In the video-realm when I’ve attempted such things, the performance hits are quite salient quite quickly. But then again, that’s going to be a different architecture and bandwidth.

I think it really just depends on how long a prediction takes which depends on architecture size.

Looking at nn.ar, it’s using libtorch running on the cpu as far as I can tell

1 Like