drumGAN - how to make boring drum sounds in more steps

Bernadeath · August 15, 2022, 8:02pm

Hello everyone

I’m really new to all of this, but I succeeded at using the nn~ prebuild for max map on Mac, one thing I can’t understand although, is that I couldn’t find any TS file anywhere in this GitHub, not even in the vst directory. If anyone could send me send some of this TS files, especially those containing vocal sample like in the YouTube video of rave, it would be much appreciated !

Thank you so much !

Anais

bledsoeflute · August 18, 2022, 10:12am

Do you have the “wheel.ts” that you get with the help file? Otherwise, .ts files seem to be thin on the ground, there is much discussion on Rave’s Github about access to them. To be honest it is probably because most people are getting questionable results. That is certainly the case with me - I am finishing up a .ts file today with lupophone samples and will post the link for you all here, if it turns out to be usable.

bledsoeflute · August 18, 2022, 7:49pm

FWIW here is a link to my .ts file. With judicious use of gating, filtering, compressing etc. you can get a kind of timbre transfer. If you make pitches above middle C, it will follow those pitches, sometimes. If you want an errant monster, mutant goose shadowing you, you will love this. https://drive.google.com/file/d/1Uk8EdczLAmge3LDCHHl7jFeC1OkBOAy_/view?usp=sharing
(warning, over 160 MB)

tremblap · August 19, 2022, 8:19am

you just described my favourite processing, ever!

tremblap · August 19, 2022, 10:10am

ok, this is fun… like really, really fun! With modular synth stuff, suddenly I wish I could notate that for a flautist And now I need to do a violin model for my current work in progress.

tremblap · August 19, 2022, 11:38am

continuing to play with this - it is fun to compare to ddsp’s flute model too. They both translate differently my naughty synths, and I like what each keeps - the noise part is fascinating and I can probably use that to excite resonators… just thinking aloud here, but this is very inspiring, thanks again for sharing!

Bernadeath · August 19, 2022, 12:02pm

Thank you so much ! I’m gonna try it

Bernadeath · August 19, 2022, 12:09pm

Hello Bledsoeflute,
Thank you for your reply, Yes I manage to have the Wheel.ts, while opening the help of nn~on MaxMsp
I have tried to look on GitHub, but haven’t found the right discussion yet, and it seems that people are less reactive…
Already using the Wheel, is opening so many possibilities, I’m now discovering it, but I will still be enthusiastic about the idea for trying some new ones !
That’s pretty nice to share yours, and I’m already excited transforming into a cavernous monster, let’s try it out
Thank you !

jan · October 6, 2022, 8:37pm

hello all,

still regarding off topic rave, theres is also access to models here: https://neutone.space/models/
as for other than colab gpu training i came across datacrunch.io

in case someone tried to use rave in terminal (as i don’t use max): i gave a shot at the reconstruction running the proposed line:

python reconstruct.py --ckpt /path/to/checkpoint --wav-folder /path/to/wav/folder

and wonder wether the .ts file is what is meant as the checkpoint, or something entirely different?
i suppose both model and target folder would have to be included.

(edited the aforementioned error as no gpu had been found, probably need to properly install pytorch)

thanks,
jan

rodrigo.constanzo · October 17, 2022, 1:15pm

Revisiting some of this as I’ve been reading through a thesis that relies heavily on RAVE/nn~. I was watching one of the older videos that first introduced RAVE (?) and it talks about how quick the process is for computation once trained (20-80x realtime). I guess on the realtime end it still windows something to start the analysis/resynthesis up as there’s definitely audible latency in the more recent demo video:

It did get me thinking/wondering about leveraging the short transients/analysis windows required for drum sounds. The first link of the drumGAN thing appears to be an offline thing which generates samples that you then trigger willy-nilly. So I was wondering about the feasibility of training up a model that could then be activated in realtime with drum input, perhaps tying control of the latent dimension offsets to realtime descriptor analysis or something. It’s unclear from the videos, but I guess the .ts files have their structure embedded so nn~ knows what to do with them once loaded. Are there models or examples that are oriented towards realtime generation/resynthesis of short sounds/attacks with minimal latency (sub 10ms)? The .ts files I did find after chasing through the discussion on github that @bledsoeflute mentioned don’t seem to be oriented towards that sort of thing at all.

Mainly wondering if the paradigm of RAVE/nn~ is feasible for realtime/minimal latency use, or if that kind of thing isn’t yet “a thing”.

bledsoeflute · October 17, 2022, 4:36pm

Every .ts file I have trained or used has latency, even those trained with “no latency” parameter in colab. I am still fuzzy about a lot of those parameters, and am not even sure the “no latency” refers to live audio and the nn~ forward function.
So as far as I have been able to tell, it is not feasible for real time use if you need minimal latency. However, if latency is not an issue (or is something you can play off of), then it can be pretty interesting, I think. I have a project with a tubist this week and we are going to try out a bunch of these .ts files so am hoping for some raunchy stuff.

I should add I haven’t tried the VST version.

rodrigo.constanzo · October 17, 2022, 7:27pm

In testing it now, there is indeed a hearty amount of latency. I guess there was a window size to play with, but it didn’t seem to want to go lower than 2048 (though you can oddly set it to 0). Even at 2048 my CPU (on my shitty mac mini) was choking on producing audio at all, much less dealing with the latency.

I wonder what the “faster than realtime” performance means then. Maybe some of how the algorithm deals with material internally or something?

Just thinking/wondering if it’d be possible to play with how things work, as I’ve gotten quite usable results from analyzing 256 (or 512) samples, and then doing stuff with that. So having discrete short analysis frames rather than a constant rolling 4096 window etc…

MartinMartin · October 20, 2022, 9:51pm

Good to hear others are having the same questions
Some .ts files are having less latency then others… but atm nothing to use realtime on stage…

bledsoeflute · October 24, 2022, 10:23am

Are you using the RAVE VST or the nn~ in MAX, just curious
Did you find a place in his code to change the window size? I made a cursory check on his Githib and couldn’t find the relevant page of code to change that setting.
Tomorrow (Tuesday Oct. 25th) I have a short meeting with Antoine Caillon and I will address this and other questions to him. I’ll include the latency issue

rodrigo.constanzo · October 24, 2022, 11:53am

I was trying it with nn~ just in Max. For the window size I think it mentions it on one of the tabs but you put it in as an argument. It doesn’t have any reference or attributes so hard to know what is tweakable.

Ah great. Let us know if/what he says!

bledsoeflute · October 24, 2022, 12:36pm

Ah yes, now I know what you mean, the window size is an argument directly in the nn~ object.
Will let you know how the meeting goes!

bledsoeflute · October 25, 2022, 4:54pm

So I spoke with Antoine today, about the latency, it can go as low as 40 milliseconds, but it depends on how the model was trained. If the “no latency” parameter is not checked, you will get a better model, but up to 600 milliseconds of latency. I suspect the .ts models available are trained this way, because it was more important to have a better representation of the audio.
The window size on nn~ does indeed have some hard constraints, it won’t go lower than 1024, it needs that many samples to produce only one latent point. On the MAC OS build you can increase the size as much as you want, on Windows not.
He is planing a new version of the code and for nn~, with more features and multichannel support.

rodrigo.constanzo · October 25, 2022, 5:41pm

Thanks for reporting back!

That’s definitely good to know.

40ms is still on the “not really usable for percussion stuff”-side of things, but that’s a far cry from what is there in the existing models.

Curious what the next code base will enable. I guess for me it would be amazing to be able to request/process a single latent point at a time (e.g. giving it a fixed 1024 samples in one go), as that could then be worked around. Presumably that would also be (much) lighter on the CPU as you’re not needing a constant stream of material being generated.

rodrigo.constanzo · September 24, 2023, 11:33am

Don’t know if anyone is still toying around with creating models, but in speaking to someone on the RAVE discord they mentioned this:

You have to train the model with the --causal flag. That could degrade the result a bit, but will cut the latency quite a bit, for me it went down from 750ms to around 250ms

Still not anywhere close to realtime usability, but I guess the --casual flag does what the “low latency” parameter is meant to do in training(?).

I also saw lots of comments about latency and the lowest people seemed to get was ~700ms, so I imagine the “40ms” refers just to an FFT window size, which would be added to the general algorithm latency.

jamesbradbury · March 11, 2024, 11:04pm

Training a model now on a 4090… lets see how we go.