Hello! I’m getting into FluCoMa and I’ve been thinking about the - what I now understand to be a difficult problem - of predicting synthesis parameters based on input audio features.
What I’m finding hard to conceptualize, is how to make predictions based on time-series data or rather, how to train a model using time series data.
Before I go ahead, I wondered if anyone might be able to point me in the right direction, I would like to begin by making a simple case predicting parameters for pitch and amplitude envelopes.
I’m starting by training an MLP on some synthetic data, roughly adapting Ted Moore’s FM synthesis example and the 2D corpus explorer.
There’s (presently) no native way to handle time series (unless you look at one of the experimental builds) but a solution that Jordie Shier and myself did was taking snaptops at multiple timeframes, correlate those to synthesis parameters, then create interpolation between those points.
The paper goes in a ton more detail, but a bunch of vids examples here:
The Multiple Timescale Morphing example halfway down the page is probably the most relevant here.
Fantastic, thank you I feel like you saved me some confusion!
I’ve been reading Mathew Yee Kings paper… I essentially want to replicate this in Max, controlling an FM synth of my own design.
And I’ve also looked at you and Jordie Shire’s paper above earlier this year but I think I needed more knowledge to grasp the relevance, I’ll read it over again!
I’d like to explore both methods, there’s certainly advantages to using smaller models, but I’ve also been advised that an RNN might be suitable for my use case.
We focused quite a bit on feature regression and trying to tackle the problem that way, while using something fairly low tech (genetic algorithm) to browse the space within Max.
We didn’t use Dexed at all, but from the algorithms we messed with, something like that seems terrifying given how radically different it can behave in the different algorithms, so I’d suggest using something with a more fixed topology!