Video for my students about MLP and autoencoder

spluta · May 25, 2021, 10:01am

This is what I just put in my paper. Does anyone buy this? Or not?

While Neural Networks (NNs) are equally adept at controlling parameter spaces of any dimensionality and systems ranging from linear to chaotic; for my own aesthetic purposes, highly dimensional, non-linear and chaotic systems are where NNs are most expressive. I have designed different NN Synths with parameter spaces ranging from 12 dimensions to 50, many of which use elements of feedback to invoke complex behavior.

However, though they can be correlated, the parameter space is not to be confused with the sonic space. While the parameter space is the one we can quantify and the one I will quantify below, the end-goal is to create synths where the user is able to traverse a multidimensional timbre space\cite{lerdahl}, as in the one that the performer feels and the one that we hear, with low-dimensional controllers.

tremblap · May 25, 2021, 11:15am

I do, but the nuance I brought of the preset space for me is still different to the parameter space. Maybe that is what you mean by your sonic space? You’ve made a low dimensional mapping which might just make sense to you and will most probably induce non-linearities… this is where the Fiebrink approach is so full of potential for me (and your explanation is very good of the difference between the 2 spaces if the sonic space is what I called the preset space - your name is better and maybe making explicit that you as the curator of that space induces these most-likely non-linear biases and that you care about a sort of smoothed and not just strictly interpolated space) also the idea of extrapolated is what I like too (you get values beyond the space. anyway I hope this rambling helps a bit

spluta · May 25, 2021, 11:56am

The preset space is a small subset of the parameter space, non?

tremblap · May 25, 2021, 12:10pm

it is but (and this is just in my head) for me the parameter space feels like there is a sense of objectivity - aka you keep all but 1 of the 40 parameter and move it and you get something linear or not.

once one start to populate presets - arbitrary subjective mappings in a low dimensional space - then it feels a lot more chaotic and situated and deffo non-linear.

Here I’ll ask @weefuzzy and @groma to help us each with their perspective on the matter…

spluta · May 25, 2021, 12:21pm

I am currently reading Ted’s paper, and in it is the Fiebrink quote where she uses the phrase parameter space for what you are calling preset space - like, the space made by one mapping. So I think we are in semantics land now.

What I mean by sonic space is the space we hear or the sound we would describe with words…as opposed to the parameters that make that space.

tremblap · May 25, 2021, 12:40pm

We can ask her in July!

tutschku · May 25, 2021, 12:54pm

Interesting. I’m actually thinking about three related spaces. The parameter space is ,all the knobs on the Synth’. Thus all the parameters the current set of modules requires. The sonic space, as PA points out is what we hear. Without a NN, in front of a huge analogue synth for example, I’m trying to learn which changes of individual parameters drive the sound into this or that direction.
Then comes dim reduction where we try to NAVIGATE a lower dimensional space (2 or 3) in order to trace gestures between defined (stored) topological areas and anything in between. I like to think about it as the gestural performance space.

tutschku · May 25, 2021, 1:02pm

The topological reference in the performance space is only one way. Others would be the quality of the gestures (speed, variation etc etc)

tutschku · May 25, 2021, 1:06pm

The correlation between the parameter space and the sonic space comes in two flavors: there is a growing knowledge of the relationships for the person changing the parameters. But any listener will perceive change in their own sonic space. It requires a lot of time and practice to realize a particular sonic imagination on, let’s say a large Eurorack. NN’s or other means of dim reduction can help to start with a few defined ‘islands’ of sonic qualities and then explore beyond.

tremblap · May 25, 2021, 3:53pm

Yes, and a few of us have done some experiments mapping those. @hbrown did explain his approach IIRC in his talk (still online) In my sandbox#2 I did three time windows and it was super fun (instant, short and long term running average) which I want to revisit now. @balintlaczko is asking himself the same questions now, and I proposed he used MLP to navigate a space of presets of mappings, thus going 2nd degree there, which should be fun to explore too!

weefuzzy · May 26, 2021, 8:45am

Seems like a reasonable stance, but that’s a lot of spaces

One thing that makes these conversations murky is the amount of work the word parameter ends up doing. In ML-speak, parameters are the things (weights, biases) learnt by the model, and this is quite likely the first association some readers would make, so it might be worth disambiguating the three (?) sets of parameters at work (input controls, model parameters, synth controls).

These are productive complications! I think the spatial metaphors might have the effect of lulling us into making a virtue of proximity in quite a limited sense, e.g. that small changes in input controls should lead to small adjustments in auditory similarity. But that’s not necessarily going to be musically productive because it implies a particular listening orientation that quite possibly doesn’t really figure large in actual performance.

We might well ask instead about how long it takes to get from A to B, which might be sonically distinct but feel musically related, or – if this is being controlled by a physical body doing physical things – how far one has to reach etc. Even trivial things like producing a convincing sense of accent to gestures might involve very brief visits to quite distinct parts of the sonic space that, nonetheless, need to be done very quickly to work musically as such.

spluta · May 26, 2021, 11:45am

I was just trying to figure out how say that my goal is almost exactly opposite this. Small motions == big auditory changes.

tedmoore · May 27, 2021, 10:18am

FWIW @a.harker’s skepticism (as @jamesbradbury put it), resonates with me. Especially when you watch the interpolation happen and it looks like interpolation (the sliders all just move in the correct direction as one would expect). Which isn’t to say that MLPs are not useful, they’re super useful for doing this. And there are differences from interpolators that are really beneficial, like being able to train a neural network differently to create a more or less “bumpy” or “wiggly” mapping through a space (more or less fit or overfit), more or less precise to the input points you gave it.

Regarding the input controls and output controls

I think there’s a fourth here which is the sonic space, as very different synth controls could produce very similar sounds, so there’s another mapping translation going on.
input controls → model parameters → synth controls → sonic space

As @tutschku was saying big auditory changes for who? The audience probably perceives noisy synth as noisy synth, probably not that different. And how small is a motion? One centimeter on the iPad? One inch? I’m making a silly comparison here, but the point is they are all small compared to having to grab and turn 50 knobs at once, per:

If one is trying to make small motions create big sound differences, there isn’t much difference between moving something a few millimeters and just hitting (and jumping to) a different preset button. With an MLP of course one can wiggle their way to that preset, but not much if the distance to it is really small!

//========================

One thing I keep coming back to in this conversation is the organization of the control space (in Hans’ example where in the 2D space should the dot go for the straight line vs. the bell shape?). We keep, essentially, just making this up: this sound maybe goes over here, this one maybe goes over there… but what if there was a way to intelligently organize that space as part of the training? Probably to do this:

But also, it could be any optimization. This would essentially be dimensionality reduction, an autoencoder could be good.

Because the relationship between input controls and output sounds needs to be learned by the performer with each mapping anyway, there’s no real skin off my back in putting this optimization in, since the learning process will be the same.

This is what I was getting at here by taking auditory descriptors of the synthesis parameters that I wanted in my dataset and then using dimensionality reduction on those auditory descriptors to try to organize my control input space less arbitrarily:

It wouldn’t always be necessary or interesting to do this but there are situations where I think it would make the whole idea of MLP mapping / interpolating more powerful/intuitive/useful/robust, something in this direction.

tremblap · May 27, 2021, 10:45am

Thanks for your post - I am happy to try to disagree here with some quality-value brought back in, or deep epistemic framework differences (of where knowledge is and how to judge the quality of a given mapping.) For instance

That you say there still implies a better mapping. But for me, as I said so many times and I think it is important to remember in musicking, the practice and the design cross-polenate. So a ‘better’ mapping could be:

one more readable for the audience
one more magical/mysterious for the audience
one more expressive for the performer/composer at a given decision making moment
one more inspiring for the performer/cmposer at a given decision making moment

as you can see, each are contradicting the others in some sort of way. I like to use all four all the time, at given time wanting more control and transparency and at others enjoying the challenge that the pattern extracted from the data is not what I expected, either from my assumptions, conscious or not, on machine listening, machine learning, and the implementation of both.

So I think making the knowledge and the tools available to as many as possible enables embracing the creative paradoxes we all have (and need) in musicking, the situatedness of all these processes. I admire your desire for a single solution, but I have to say that I am not certain it is possible in a creative process where the various parameter (context and sounds and desires) are mutating and informed in very non-linear ways by the said mutations.

There, @weefuzzy will probably tell me I don’t go far enough Also, there are a few interesting articles talking about this in the politics of knowledge evaluation, and @saguaro has a few too from a musician perspective.

saguaro · May 28, 2021, 5:59pm

Yea, I haven’t watched the video here yet but certainly the word ‘control’ is doing a lot of work in this thread. I’m currently 1 year behind in FluCoMa videos (I had a great binge yesterday), so I’ll probably have something smart to say in a year’s time on this. Certainly the interpolation thought is an obvious one once you start playing with it…