Realtime performance learning / prediction

blackistone · January 15, 2024, 12:58pm

I am trying to take a live signal of performance to predict what a performer is likely to play next and compare that with what is actually performed. I imagine this would be simplest through some means of downbeat estimation and sybomic key estimation (through chroma analysis?), but would be interested in opinions of other possibilities.

The matching suggested here seems like a start in some respects, but I would be looking for a greater sliding window, and possibly something that updates on everything so far to predict more structure.

Something like actual sample matching as listed here sounds neat, but likely not the right path as some form a structural repetition woud have to also be determined or pre-ordained to have a useful comparison. Also likely far more processing than live distillation to beat and chroma.

At any rate, quite new to the toolkit so suggestions of which specific tools to investigate (in addition to any general insights) would be fantastic.

rodrigo.constanzo · January 15, 2024, 1:29pm

Firstly, welcome!

This is super interesting. I guess a first question, by “prediction” do you mean the predicted descriptors of what they may be playing, or predicted grains (from a corpus/dataset) or predicted audio (as in, SampleRNN-type stuff)?

That will have some massive implications on what is to follow.

I would like to revisit this idea now that I’m much more familiar with tools and such as getting away from a bit-y/granular sound would be amazing, and real-time limitations give you few options short of trying something like this.

I suppose, technically, it wouldn’t be crazy difficult to set up, it would just be a matter of finding the right window/tolerance to allow for things.

But ultimately it would be finding the nearest match, then querying the next frames of each (incoming/corpus) until they no longer satisfy a “close enough” window, then re dip into the corpus for a fresh match and carry on.

It could also get a lot more clever doing some descriptor subtraction (ala @b.hackbarth’s AudioGuide) where after the nearest match is found, subtract the loudness/centrroid and then query for an additional match based on that, and play both at the same time (or staggered).

Ultimately, starting with something, anything, is as a good a place as any! So circling back to the initial question will narrow down the objects/tools to start looking at.