Matching Video to Audio

I’ve used flucoma to generate clusters of audio to facilitate musaicking, the audio plot. But now I want to use the audio snippets to select video output as well.

I’ve managed to generate a umap of video clips, and these appear as several clusters, the video plot. I could superimpose this over the audio plot, and when selecting an audio clip, I can select 4 nearest neighbours in the video plot. But, the clusters in the audio plot vs the clusters in the video plot are very different.

How can I transform the video plot to match the audio plot? It’s not supercollider specific, more of a python jupyter slant.


this sounds interesting, yet I can read your idea two ways:

  1. if you want to link the sound to the image, i.e. you browse the audio n-n-map and you play the audio+video of these n-n
  2. if you want both maps to be independently browsed.

I have a solution for both problems, but let me know which one it is :slight_smile:

I was going for 1. But I’d love to hear both.

1 is the simplest :slight_smile: I recommend using your identifiers as position markers in video frames - using matching names for the same ‘slice’ of sound and video.

Then when you match a sound, you get the ID back with the query and that tells you what audio and video to pick.

does that make sense?

That’s a little different from what i had in mind. The video plot is made up of cut up pieces of small 10 sec chunks that have been umapped to a 2D plot. I could do a nearest neighbour on this plot just like how the 2D audio corpus explorer works.

I want to superimpose this plot over the audio plot, and by selecting a node in the audio plot, do a 2 nearest neighbour on the video plot. The problem is the video clusters are different from the audio clusters, so I’d like to transform the video clusters by translate/rotate/skew to roughly match the audio clusters.

How would I go about doing this?

so this is #2. and it is not trivial since there is no good way to do it. There is a thread here where I try to do that - matching a space onto another - in which there is also a link to a machine-learnt curated example in SC.

Let me know if that helps or not. I can try to explain better.

Strangely enough I’ve read through that before, though it completely slipped my mind. I should try the mlp based solution.

1 Like