So what kind of input/output will your Max thing do? Buffers?
I was wondering, after posting this, if audio writing/reading can happen at audio rate, with the right threading.
Working on a project with a buddy for the upcoming NIME about using a genetic algorithm to crawl a synth space, then have a “timbre analogy” created from realtime input (basically a slightly more lofi but more generalizable version of this approach using DDSP).
As part of that he’s built a thing where he can train up a regressor in Python and then dump out a .json that can be loaded into the FluCoMa stuff, with the structure/activations all matching.
Once that’s up and running will share that as well.
Audio or a list. It is designed for DSP processing so the MSP object will output multichannel audio. The Max object will output a list. In pd you snake the audio together then unsnake the output. Hopefully that will be as easy in Max.
I like a triggerable audio object though. I hadn’t thought of that. That would still send audio as its output, but it will only run inference when triggered. So you could get a changing output faster than the control rate, but not have to do thousands of calculations every sample. Seems reasonable. Is that what you are imagining?
The use case I’m working on just needs to happen once per onset, so it would be ideal to have the whole signal path stay in the signal domain for speed/timing.
Though full-blown audio rate would be great.
MC audio is a decent way to handle that I think (in Max).
Independent of the technical-side of things here, curious what kind of musical results and/or funky algorithms you’ve been able to cook up with this, if you have a version working in your setup already.
I can picture some crazy feedback processes where there’s NNs in the signal path.
The shape isn’t unusual, but you’d need to do some wrangling to write / read the weights and biases into a json file of the expected format, which shouldn’t be too hard to do in python.
It’s an sklearn-to-flucoma-mlp script but it might be a useful reference for formatting the data for the flucoma json. also if a pytorch-to-flucoma gets made, it might be nice to fold this into some Python Scripts for FluCoMa package. I have some python-NMF ones I use when teaching but are a good extension of using the flucoma nmf stuff.
@spluta Your project looks very interesting! Do you already have any resources on how to train models for RTNeural for timbre transfer (similar to rave)? I dug through the github repo, but could only find examples for control rate training.
Thanks for your interest in my project. I just got most builds made for SC, pd, and max and uploaded to the repo. The only one that I haven’t cracked yet is the Max Windows build, which gets a strange error deep in the RTNeural source code. Anyhow, the builds are here under the Releases. (These include Rod’s requested feature of being able to run at audio rate, but only run inference when receiving and audio rate impulse.)
RTNeural_Plugin is more of a general purpose inference engine, so it really comes down to whether you can get your nn into a state that can be loaded by Jatin’s library. The list of available layers are here..
I don’t know the exact shape of Rave models. I know they are autoencoders, but the devil is certainly in the details.
The future goal for this plugin would be to load any pytorch training. The problem is that libtorch is not real-time safe (which is why the Rave stuff crackles). ONNX is not particularly efficient. Maybe a new library will come along that is both of those things, and that would be relatively easy to implement from where I am at.
This is interesting stuff. So you are training your models presumably with keras and tensorflow to use with this? Or have you cracked the pytorch problem by this point?
The Windows build for Max now works. I had to build with MinGW, not MSVC.
No, actually, I need to document this better, but to see what is going on you need to download the RTNeural_python directory from the release builds. This is where all the trainings happen. There is a README in the root directory on how to set up the virtual environment and then READMEs in the subdirectories on how to do specific trainings. This is by no means exhaustive. The plugin can do way more than what I am showing how to train for.
In most cases I am training in pytorch and converting the trainings to the keras format that RTNeural wants. Trainings should save a .pt file as well as the _RTNeural.json in case their is a future where .pt files can be loaded directly.
Hmm. Might be easy to integrate their code. The interface with the RTNeural inference engine is barely 200 lines of code in my plugin!! That was the easy part. I’ll give it a shot at some point. There are good reasons to use LibTorch and ONNX and reasons not to. The reasons not to are that LibTorch is not real-time safe and neither are anywhere near efficient enough to use in a real-time setting. But having them available would be good because they would just load any .pt or .onnx file without having to convert to RTNeural format.
Would love to, eventually, see something like this built into FluCoMa natively (I know @balintlaczko has also expressed some desire for a “load your own model” network paradigm), as well as the option for audio-rate stuff too. (The DSP possibilities and examples are pretty exciting!).
Mainly thinking of being able to use the same interface/plumbing for inference.
Thanks for posting that Rod. The reason I made this was because of Owen’s initial skepticism around making audio rate stuff using the FluCoMa framework. In particular the buffer interface would be a hurdle. Jatin’s RTNeural framework, however, is designed for audio. I didn’t intent to make something that would also run at control rate, but it just kind of worked, so I included it.
I wanted to put this in the LSTM thread, but I didn’t want to soil two threads with my own stuff. (And I promise not to keep pointing towards my project after this post. I realize this is the FluCoMa forum, not the Sam’s project forum.)
LSTM Training:
I have found the best repository for LSTM training to be AutomatedGuitarAmpModeling by Alec Wright, which is written using PyTorch. It is used in Proteus and Aida-x and ChowDSP things. I have found it doesn’t just train guitar amps. It works well with time series things as well. I made an example attached below, which will does time series prediction. I think it could be tweaked to do more complex pitch or word prediction. If not, such scripts certainly exist on GitHub. The python scripts used are included in my repository.
Definitely couldn’t work without reconsidering the interface, but your mc. solution seems quite elegant all things considered. Can get gnarly at higher dimensionality, but nothing too crazy.
The pre/post processing does get a bit more complex as you then have to scale/normalize/inversetransform/etc… on signals, rather than buffers. I suppose this could either keeping the existing paradigm where each step is a separate object (e.g. mc.pack~ 4 → fluid.mcnormalize~ → fluid.rtneural~ 4 13 → fluid.mcnormalize~ → mc.unpack~ 13) or just have more of it baked into a potential fluid.rtneural~ where you can specify models as well as input/output processing pipelines (e.g. mc.pack~ 4 → fluid.rtneural~ 4 13 → mc.unpack~ 13).
The former would also have knock-on effects of being really cool mc.~ processing.
for me, it is the forum where we share techno-musical investigations (or musico-techno-musings) in this sort of things, so feel free to continue contributing