Visual Corpus Exploration Patch

Hi everyone,

Below is a patch that lets you explore a corpus by analysing it with audio features and mapping those features onto a two-dimensional visual map. You can scrub through the space using your mouse :computer_mouse:

It’s a little bit rough but I hope the patch makes sense. Start by dropping in a folder of sounds and it should automatically do something. You can then configure the parameters of UMAP and the features to experiment further.

2d_sample_browsing.maxpat (108.3 KB)

4 Likes

Very cool!

Some super early suggestions:

  • have the umap parameter (numneighbours, mindist) defaults show up as I had no clue what the values were to then adjust from
  • add polyphony to the audio playback as it sounds like a terrible machine gun while browsing samples
  • remove the @loop attribute as there’s no way to stop playback

It would be great to combine this with the (semi)oldschool UI for selecting what parameters/stats/etc… rather than looking at a single feature.

This is more of a broader question. The use case here is “dragging a folder of individual samples” but then the internal workflow concatenates everything and then everything after that is managed via offsets. I know @tremblap quite likes this workflow, and it does make a ton of sense if you are starting from a single long file and then segmenting as part of the process, but it seems like this would create a load of friction further downstream if you want to adjust parameters, remove individual samples, tweak overlaps, etc…

More specifically, what is gained by automatically concatenating everything into a single monolithic file? (to offset the procedural confusion it causes)

I’ll try and be brief to avoid derailing the thread (we can always move is this erupts). In short, I think it’s swings and roundabouts. The obvious alternative in Max would be to use a polybuffer~. One still then needs to keep housekeeping data (buffer numbers now, rather than offsets; or as well as offsets if you’re also segmenting the individual files).

As Owen said, the takeaway is swings and roundabouts. Personally, I prefer to chuck stuff in a polybuffer~ and go from there having had my audio sliced already and the audio files themselves carrying the meaningful information about what the start/end of a segment is. This patch emerged from a workshop with absolute beginners where many of them were not familiar with buffers~ let alone polybuffers~ and I opted to show them one way that was relatively anything in segments out. I should also add the context that this isn’t meant to be the ‘canonical FluCoMa corpus exploration patch’. It’s one instance of what one might do with the tools, and a possible arrangement of objects toward a singular method of decomposition.

fascist

Again, a consequence of teaching it is to have sounds repeatedly play so that you can compare sounds and talk over them while they play. Chuck an attrui on if you wanna stop it easily.

That’s my woopsies and should be fixed.

I’d be curious to see how the patch fares with some one-shot drum samples :wink:

1 Like

Yeah, that’s pretty tangential, but I personally find the offsets really confusing and unintuitive, particularly if you start pruning out samples as part of the process.

I just wasn’t sure if there was some memory optimization that meant a single chonky buffer was better than individual polybuffers or something like that.

The other bits are definitely simple/easy fixes, was just pointing out things on first play with the patch.

Also the numbers get really big which makes it hard to parse from a cursory glance IMO. We’re on the same page here :stuck_out_tongue:

Yeah nah, nothing fancy like that. I could whip up a polybuffer~ example of how I would do it (with @blocking 0) if that would be helpful to show the other side too?

1 Like

This has a pretty stripped back version of polyphonic playback:

Don’t know how easy the grafting would be for the analysis since it expects offsets of a single buffer though.

I’d peg it as a 4/10 on faff scale.

2 Likes

Alright here we go:

2d_sample_browsing-rodstyle.maxpat (99.3 KB)

polyphonic playback with mc for Rod :wink: as well as being able to drop in a folder of audio files to fill up a polybuffer~ for the corpus.

Nice!

Still no default params though :frowning:
Screenshot 2021-12-13 at 10.00.54 pm

And the speedlim-less-ness of the playback still leaves me with some machine gun shivers.

2d_sample_browsing-rodstyle.maxpat (100.1 KB)

1 Like

This is all good banter, my friends, but let’s not forget this is a beginner’s patch. And that you are friends (I don’t think everyone knows how you two know each other :slight_smile: )

The feedback is good as it will help us hone in on what the rest of this series of beginning patches will include (the 101s) until we get to design the intermediates (201s) and advanced (301s).

Looking forward to seeing the rest of the series!

p

1 Like

Looks nice James.

Given that visualization is the focus here, I would definitely make the LCD larger (double it in size for instance). Some of the colors don’t show up very well - especially the yellow. Might be nice to use a perceptually uniform colormap (which you could grab from somewhere like matplotlib).

Finally, I agree with Rod that this patch would greatly benefit from letting the user work with UMAP or letting them pick plain ol’ descriptors to plot. For instance, X=Loudness, Y=Zerocrossings. Or, more interesting: X=loudness, Y=1-D UMAP MFCCs. The reason being: you can learn a lot about which descriptors are interesting through visualization and browsing. However, I concede that this is more complicated from a UI point of view. Always a delicate balance.

2 Likes

that is in the follow-up series, as one way of dealing with high dimension count. Stay tuned!

@rodrigo.constanzo
@jamesbradbury
A disadvantage of polybuffer~ is the constant crashing in Max for Live… It is just unusable.

1 Like

YESSSS I now have an official reason to not like it :slight_smile: It has advantages (dynamic allocation and batch handling) but I had an irrational hatred for it - now you vindicated me :slight_smile:

seriously though, a lot of our examples use a single long buffer which works well in other contexts. As for the crashs, @weefuzzy is a M4L powerabuser so he might have a trick or two to share…