Auditory Features - Synthesis Parameters Correlation

Hey All Flucoma Users,

I am new to the toolset - despite downloading it ages ago!!!

I am looking at a way to analyse an output of a system I have developed - the new pulsar generator (nuPG: https://www.marcinpietruszewski.com/the-new-pulsar-generator) and match the analysis with synthesis parameters. I am thinking about a timbre-map space sort of thing where a relationship between parameters of synthesis and auditory features can be established. An ultimate goal would be to reverse the process, specify a particular auditory description and get a set of matching synthesis parameters. At the moment I am recording audio and control data simultaneously and was wondering if you may have suggestions on where to go with it from there?

This is an early and exploratory stage, and I would like to hear your suggestions.

best

marcin

3 Likes

Welcome @marcin_pietruszewski

The very short version of this answer is that it’s non-trivial, especially for something whose range of sounds is as broad as your pulsar generator. The current public FluCoMa release will certainly have some helpful stuff, insofar as it has a range of decompositions and descriptors available, but the stuff in the second set of tools we’re currently developing (for building and exploring databases of sounds etc) will provide a fuller framework, once they’re available.

Longer:
The general question of mapping the space of possibilities of a synthesiser or processor is still open (and, IIRC, @rodrigo.constanzo has raised it before on the forum). There’s some papers about it, but not so many. Stefano Fasciani made a tool using an approach called a Extreme Learning Machines (https://ro.uow.edu.au/dubaipapers/752/) and there was also a paper at NIME last year (https://www.nime.org/proceedings/2019/nime2019_paper085.pdf).

There’s a number of moving parts to getting something like this to work. Broadly, you’re trying to learn a useable mapping between two multidimensional spaces – the controls of the synth on the one hand, and some set of descriptors that usefully captures the range of possible sounds on the other. The really hard bit, I reckon, is finding that set of descriptors.

IIRC, Fasciani takes a sort of kitchen sink approach, and collects loads of features and then does some dimension reduction to try and mitigate the redundnacy that this tactic involves. Whilst not aimed specifcally at this problem, we explored a different approach for NIME last year, comparing trying to learn the features themselves using a small neural network, against using more generic MFCCs (https://pure.hud.ac.uk/en/publications/adaptive-mapping-of-sound-collections-for-data-driven-musical-int, https://github.com/flucoma/FluidCorpusMap). The code for this is a mixture of SC and Python.

What’s still a very open question is how adequetely account for the morphological character of sounds. With something like the Pulsar Generator, some of its states are characterised by emergent rhythms and so forth, yes? In which case, per-frame features on their own can’t describe the temporal relationships we might hear. One thing I’ve been playing with semi-privately (with some input from @jamesbradbury and @d.murray-rust) is using the autocorrelation of features to try and capture rhythmic differences, alebit inconclusively.

Concretely, given what’s currently available, one way to get started on this would be to start exploring different features, and seeing whether you and the computer can come to an agreement about how well they describe different types of sound from Pulsar Generator, perhaps by using something like the existing KMeans quark.

1 Like

Welcome indeed!

The work of @spluta and @tedmoore is also quite relevant, and some of it is shared on this (public) side of the forum. I’m sorry if that sounds like a tease, but we have to devise the interface with a first generation of composers, and then we release to the world :wink:

In the meantime, you can check these guys, and also you can explore Wekinator. The tutorial on Kadenze are free, and are very useful to get our head around what machine learning can and can’t do for you in this case.

I hope this helps!

thanks a lot @weefuzzy and @tremblap ,

I thought it won’t be a straight forward in-out scenario.

Fasciani’s paper looks like a good approach to wrap my head around on what’s needed and possible

Blockquote
With something like the Pulsar Generator, some of its states are characterised by emergent rhythms and so forth, yes?

Yes. These phrase-level things are one of more interesting aspects of the technique. Additionally, the relationship between the pulsaret waveform and an envelope are quite important to what’s happening spectrally.

I am aware of @spluta and @tedmoore work. Already found some relevant bits of code here on the forum. Thanks :slight_smile:

m

1 Like

Definitely still interested in this kind of thing, but haven’t messed with it in ages.

I particularly the stuff that @spluta / @tedmoore (and @tremblap) showed where you have a complex synth being controlled by an x/y interface.

My main interest was having a few “pre-baked” control schemes for each of my dsp modules where I can control it with one, two, or three parameters depending on the kind of interface(s) I’m working with at the time, and know I can just tap into that control stream and have a fairly useful range of control over the module.

As @weefuzzy pointed out at the time, something that does dsp (vs a synth/generator) is a lot more complex because the input can be infinitely variable on its own, much less what the parameters of the actual model do.

Actually, talking about this reminds me of @jacob.hart’s talk at the last plenary where he showed some (output) descriptor space analysis of one of the modules I used in my performance. I believe for this he used “representative audio” from the performance to analyze the possible output of the (Wavefolder) device I used in parts of the performance.

Somewhere around here in the talk he gets into that sort of thing:

(around 10:24 if the embedded timestamp jump doesn’t work)

2 Likes

@rodrigo.constanzo do you have also a video showing your hardware box that was doing similar robot-sampling?

On my end, the morphology is certainly the issue. I tried mapping analysis of trumpet playing to my feedback synth. It “works”, but it sounds not so good. The issue that I found is that that if you are playing linearly on an xy pad or whatever, the parameters of the synth will want to move linearly as well. When I mapped to the trumpet, the analysis was telling the synth to jump from one sonic region to another, which is a linear jump on the trumpet, but not on the synth. It was basically going through a wormhole, and it sounded messed up. The reason is probably that my synth has too many dimensions to do this kind of work. Fewer dimensions will work, but I think it will be boring.

An obvious solution is to lag the data a bit, but I am wait for the realtime KMeans and KNN stuff to do this.

Took me a bit to remember what you meant.

But this:

(more info here)

This was very crude/dumb, as this was part of its charm.

In the video above, I think I did every combination (up to 7 simultaneous) of 16 points on a ciat-lonbarde Fourses (oldschool one) and then ran a Max patch that analyzed loudness/pitch/centroid/flatness of each individual connection (I think it was an analysis window of around 50-100ms each). This is before the time of derivatives too, and since this synth is super chaotic anyways, the analysis isn’t super clean in terms of what would happen after.

Once that analysis is done, it does real-time analysis on the guitar input and “repatches” the synth according to that.

1 Like

Thanks again, so much food for thought here.

I am going to start with a basic PS model with a limited number of parameters - pulsaret waveform, envelope, trigger frequency, grain frequency, amplitude and spatial position - and take it from there. I will post bits of code as soon as I get interesting results.

m

1 Like