Mapping problem - a potential solution - a journal

tremblap · July 16, 2021, 10:45am

So, I’m writing for violin, and I don’t want to write dots otherwise than by ear, so this is perfect for yet another stab at the impossible musaiking problem

for now:

My gesture/sound targets are
- real-time bass gestures (played, with all the sandbox#2 & 3 virtuosity recycling desiderata some of which have also emerged in @rodrigo.constanzo explorations
- modular synth gestures I recorded with enough pitch and noise to be complex
my sound source corpus is the orchidea-ircam dataset corrected by @danieleghisi and friends

Same old same old, I know. But this is fun. I will obviously use Bach later, once I have a good sounding EDL, but for now I need to get to it around the usual hurdles.

I have a masterplan, which involves switching queries according to some descriptors as discussed previously and now implemented in @b.hackbarth’s fantastic Audioguide.

But: the switching needs to happen between timbral classes of onsets. The problem isthe eternal non-overlapping timbral space of onsets of violin and (bass|synth) so I was puzzled… it is nothing new, very similar to stuff I spoke about in sandbox#3, and that @rodrigo.constanzo keeps saying about trying to match space A to space B… especially when using UMAP to get a significant lower-dimention space and hoping to remove noise in the repesentation, like @groma and @weefuzzy keep telling me…

so the question: how does one map a latent timbral spaced to another latent space? Making a space with both dataset (target and source) does not solve the non-overlap problem… and separate mapping is arbitrary… and then the obvious answer came to me when I listen to the fantast keynote by Rebecca - I can use a simple regressor do to ad hoc mappings!

So my plan is:

extract a latent timbral descriptor space of my source corpus (a few hypotheses to test there)
extract a latent timbral descriptor space of my target (here too)
do a dirty match of values of both smaller dimensions and see where it leads me

I will provide examples in various threads I think because each task is fun. But I’m sharing here because the last idea might be a solution to other people’s problem. It was in my face all along but hey! that’s life!

rodrigo.constanzo · July 16, 2021, 7:32pm

Very interesting on a few fronts.

I take it from your comments about AudioGuide-esque forking and pitch/noise, that you are still aiming for some kind of APT thing where you use another descriptor (confidence/flatness or whatever) to fork the querying, and with the “T” being a more complex space now than what you were initially planning (1-2 dimensions).

So that’s kind of cool there as it challenges the “equal balance between the three parameters” idea a bit (if I understood you correctly).

So by this do you mean literal supervised “connecting the dots” between the two timbral spaces?

I guess in this case it would be relatively static once you commit to some mappings as you have finite input, and a finite corpus (for the purposes of this piece/idea), so it needn’t be an extensible framework. That’s part of where I’m struggling with my idea/approach in that I want to be able to have an arbitrary amount of corpora, and the thought of hand-correlating timbral spaces every time seems super tedious (to me).

Very interested in your methodology here, in terms of general analysis/processing pipeline, but also what you plan on doing with the bass material. Will it be methodic (e.g. playing every note and multiple dynamics with multiple attack and sustain types) or more exploratory (“jamming” for long stretches of time and extracting everything from that).

And obviously most interested in this, as I suppose this is the crux of the whole thing.

jamesbradbury · July 16, 2021, 7:53pm

I guess I’m OOTL and it might be useful to others reading this thread if you can clarify what “forking” a query means?

rodrigo.constanzo · July 16, 2021, 8:08pm

In @tremblap’s original example, it was using something like pitch confidence to determine what the actual matching criteria for a given source/target was. So if the pitch confidence was high, you would then weigh the pitch value highly and carry on, whereas if the pitch confidence was low, you can ignore pitch altogether in your query.

Given the overall fluid context, this requires a “fork” of each dataset with and without pitch (and/or other variations/massagings). This works well for this kind of binary forking, but beyond that, it gets a little bit more problematic.

jamesbradbury · July 16, 2021, 8:10pm

Yeah okay that makes sense. I’m not versed in doing this sort of things in Max, and in my mind it would just be something conditional in text-land rather than ‘forking’ anything which has other connotations.

rodrigo.constanzo · July 16, 2021, 8:49pm

Well, it was either calling it that or “descriptor pivot reviews”, or PRs for short.

spluta · July 17, 2021, 8:02am

How would the regressor approach be different than reducing the two spaces to equal dimensionality and then just mapping [x, y] to [x, y] within the separate spaces?

I think I was working on the same problem earlier this year when I was trying to make a “descriptor-based ring modulator”. Mapping the player’s input to a multi-dimensional synth. But I never quite got it working the way I wanted it to.

tremblap · July 17, 2021, 9:36am

How dumb of me to post this on the day I unplug for 10 days… here we go a few answers - because i needed to come back online for a few minutes and you guys are more interesting than the admin I needed to finish

yes!

the latter! The idea is to do a few samples of each technique and see where it leads me.

@rodrigo.constanzo said it better than I could It is a decision tree, and in this case I plan to have 2 decision forks. More on that later, for now, I just need to see how I can map attack techniques.

That is the beauty of neural nets. My source and target 2d spaces won’t match because UMAP will deform and distort and scale each of the training datasets to make each respect the algo constraints - these are learnt 2D and because the descriptor space of both source and target are not the same it will be distorted/scaled differently.

But if I get sensible 2D for each, let’s say that all my pizz are together, and all my gritty stuff together, etc, then I could map clusters to clusters using NN. It will be a very non-linear mapping and might not even work all the time, but should be fun to explore. That was the epiphany of this thread. Like my example 8b - fun and playable if a little loose

Hence the steps above of making a working 2D space for each (2 extracted features from my HD descriptor space) - in other word, check @rodrigo.constanzo @tutschku @balintlaczko visualiser thread, make 2 of those (in 2d instead of 3 for ease of testing first) and if both are valid, use small training of NN to map the 2 reduced spaces with each other.

I’ll try that after I’ve slept for a week

rodrigo.constanzo · July 17, 2021, 12:03pm

Ah right.

I guess it depends a lot on the type of material you are using, so if there are very obvious differentiations like that (pizz clusters, etc…), then mapping “cluster to cluster” could be useful. Do you plan on refining each cluster beyond that? (e.g. mini/sub datasets where you connecto two clusters together, but then add some nodes to points within each cluster as well).

Also spitballing here, but could you also not do a semi-automated version of this where you create (or kmeans your way) to clusters based on completely different descriptor/data spaces, then once you have that, compare the clusters on a consistent set of descriptors to match them to each other automagically? You could still gain the benefit of the unique/distorted spaces from UMAP etc…, but then be taken part of the way there in terms of matching everything up.

tremblap · July 17, 2021, 12:11pm

The short answer is

I don’t know and I’ll try it
because I don’t want blunt classes but a smooth-ish mapping so I’ll try few points of correspondance and hope with small data I get something to regress

I’ll report back with this low hanging fruit and take it from there.

tremblap · August 8, 2021, 11:27am

ok a first experiment with mapping 2 latent spaces is online (Latent-space mapping example code) - that was a way to test my 3rd point in a more ‘objective’ way than just having 2 spaces mapped and working with the result - that is the next (fun) point