So what kind of input/output will your Max thing do? Buffers?
I was wondering, after posting this, if audio writing/reading can happen at audio rate, with the right threading.
Working on a project with a buddy for the upcoming NIME about using a genetic algorithm to crawl a synth space, then have a “timbre analogy” created from realtime input (basically a slightly more lofi but more generalizable version of this approach using DDSP).
As part of that he’s built a thing where he can train up a regressor in Python and then dump out a .json that can be loaded into the FluCoMa stuff, with the structure/activations all matching.
Once that’s up and running will share that as well.
Audio or a list. It is designed for DSP processing so the MSP object will output multichannel audio. The Max object will output a list. In pd you snake the audio together then unsnake the output. Hopefully that will be as easy in Max.
I like a triggerable audio object though. I hadn’t thought of that. That would still send audio as its output, but it will only run inference when triggered. So you could get a changing output faster than the control rate, but not have to do thousands of calculations every sample. Seems reasonable. Is that what you are imagining?
The use case I’m working on just needs to happen once per onset, so it would be ideal to have the whole signal path stay in the signal domain for speed/timing.
Though full-blown audio rate would be great.
MC audio is a decent way to handle that I think (in Max).
Independent of the technical-side of things here, curious what kind of musical results and/or funky algorithms you’ve been able to cook up with this, if you have a version working in your setup already.
I can picture some crazy feedback processes where there’s NNs in the signal path.
The shape isn’t unusual, but you’d need to do some wrangling to write / read the weights and biases into a json file of the expected format, which shouldn’t be too hard to do in python.
It’s an sklearn-to-flucoma-mlp script but it might be a useful reference for formatting the data for the flucoma json. also if a pytorch-to-flucoma gets made, it might be nice to fold this into some Python Scripts for FluCoMa package. I have some python-NMF ones I use when teaching but are a good extension of using the flucoma nmf stuff.
@spluta Your project looks very interesting! Do you already have any resources on how to train models for RTNeural for timbre transfer (similar to rave)? I dug through the github repo, but could only find examples for control rate training.
Thanks for your interest in my project. I just got most builds made for SC, pd, and max and uploaded to the repo. The only one that I haven’t cracked yet is the Max Windows build, which gets a strange error deep in the RTNeural source code. Anyhow, the builds are here under the Releases. (These include Rod’s requested feature of being able to run at audio rate, but only run inference when receiving and audio rate impulse.)
RTNeural_Plugin is more of a general purpose inference engine, so it really comes down to whether you can get your nn into a state that can be loaded by Jatin’s library. The list of available layers are here..
I don’t know the exact shape of Rave models. I know they are autoencoders, but the devil is certainly in the details.
The future goal for this plugin would be to load any pytorch training. The problem is that libtorch is not real-time safe (which is why the Rave stuff crackles). ONNX is not particularly efficient. Maybe a new library will come along that is both of those things, and that would be relatively easy to implement from where I am at.