Visual Corpus Exploration Patch

Hi everyone,

Below is a patch that lets you explore a corpus by analysing it with audio features and mapping those features onto a two-dimensional visual map. You can scrub through the space using your mouse :computer_mouse:

It’s a little bit rough but I hope the patch makes sense. Start by dropping in a folder of sounds and it should automatically do something. You can then configure the parameters of UMAP and the features to experiment further.

2d_sample_browsing.maxpat (108.3 KB)

3 Likes

Very cool!

Some super early suggestions:

  • have the umap parameter (numneighbours, mindist) defaults show up as I had no clue what the values were to then adjust from
  • add polyphony to the audio playback as it sounds like a terrible machine gun while browsing samples
  • remove the @loop attribute as there’s no way to stop playback

It would be great to combine this with the (semi)oldschool UI for selecting what parameters/stats/etc
 rather than looking at a single feature.

This is more of a broader question. The use case here is “dragging a folder of individual samples” but then the internal workflow concatenates everything and then everything after that is managed via offsets. I know @tremblap quite likes this workflow, and it does make a ton of sense if you are starting from a single long file and then segmenting as part of the process, but it seems like this would create a load of friction further downstream if you want to adjust parameters, remove individual samples, tweak overlaps, etc


More specifically, what is gained by automatically concatenating everything into a single monolithic file? (to offset the procedural confusion it causes)

I’ll try and be brief to avoid derailing the thread (we can always move is this erupts). In short, I think it’s swings and roundabouts. The obvious alternative in Max would be to use a polybuffer~. One still then needs to keep housekeeping data (buffer numbers now, rather than offsets; or as well as offsets if you’re also segmenting the individual files).

As Owen said, the takeaway is swings and roundabouts. Personally, I prefer to chuck stuff in a polybuffer~ and go from there having had my audio sliced already and the audio files themselves carrying the meaningful information about what the start/end of a segment is. This patch emerged from a workshop with absolute beginners where many of them were not familiar with buffers~ let alone polybuffers~ and I opted to show them one way that was relatively anything in segments out. I should also add the context that this isn’t meant to be the ‘canonical FluCoMa corpus exploration patch’. It’s one instance of what one might do with the tools, and a possible arrangement of objects toward a singular method of decomposition.

fascist

Again, a consequence of teaching it is to have sounds repeatedly play so that you can compare sounds and talk over them while they play. Chuck an attrui on if you wanna stop it easily.

That’s my woopsies and should be fixed.

I’d be curious to see how the patch fares with some one-shot drum samples :wink:

1 Like

Yeah, that’s pretty tangential, but I personally find the offsets really confusing and unintuitive, particularly if you start pruning out samples as part of the process.

I just wasn’t sure if there was some memory optimization that meant a single chonky buffer was better than individual polybuffers or something like that.

The other bits are definitely simple/easy fixes, was just pointing out things on first play with the patch.

Also the numbers get really big which makes it hard to parse from a cursory glance IMO. We’re on the same page here :stuck_out_tongue:

Yeah nah, nothing fancy like that. I could whip up a polybuffer~ example of how I would do it (with @blocking 0) if that would be helpful to show the other side too?

1 Like

This has a pretty stripped back version of polyphonic playback:

Don’t know how easy the grafting would be for the analysis since it expects offsets of a single buffer though.

I’d peg it as a 4/10 on faff scale.

2 Likes

Alright here we go:

2d_sample_browsing-rodstyle.maxpat (99.3 KB)

polyphonic playback with mc for Rod :wink: as well as being able to drop in a folder of audio files to fill up a polybuffer~ for the corpus.

Nice!

Still no default params though :frowning:
Screenshot 2021-12-13 at 10.00.54 pm

And the speedlim-less-ness of the playback still leaves me with some machine gun shivers.

2d_sample_browsing-rodstyle.maxpat (100.1 KB)

1 Like

This is all good banter, my friends, but let’s not forget this is a beginner’s patch. And that you are friends (I don’t think everyone knows how you two know each other :slight_smile: )

The feedback is good as it will help us hone in on what the rest of this series of beginning patches will include (the 101s) until we get to design the intermediates (201s) and advanced (301s).

Looking forward to seeing the rest of the series!

p

1 Like

Looks nice James.

Given that visualization is the focus here, I would definitely make the LCD larger (double it in size for instance). Some of the colors don’t show up very well - especially the yellow. Might be nice to use a perceptually uniform colormap (which you could grab from somewhere like matplotlib).

Finally, I agree with Rod that this patch would greatly benefit from letting the user work with UMAP or letting them pick plain ol’ descriptors to plot. For instance, X=Loudness, Y=Zerocrossings. Or, more interesting: X=loudness, Y=1-D UMAP MFCCs. The reason being: you can learn a lot about which descriptors are interesting through visualization and browsing. However, I concede that this is more complicated from a UI point of view. Always a delicate balance.

2 Likes

that is in the follow-up series, as one way of dealing with high dimension count. Stay tuned!

@rodrigo.constanzo
@jamesbradbury
A disadvantage of polybuffer~ is the constant crashing in Max for Live
 It is just unusable.

1 Like

YESSSS I now have an official reason to not like it :slight_smile: It has advantages (dynamic allocation and batch handling) but I had an irrational hatred for it - now you vindicated me :slight_smile:

seriously though, a lot of our examples use a single long buffer which works well in other contexts. As for the crashs, @weefuzzy is a M4L powerabuser so he might have a trick or two to share


2 Likes

First of all, thanks to the Flucoma team for these wonderful tools!

I am working on a piano timbre analysis patch in MAX and I am seeking some advice. I have been experimenting with the “visual corpus exploration” patch and followed the video tutorial series on “creating a 2D corpus explorer”. Starting from a corpus of piano recordings, I have created a 3D space representation of the timbre based on MFCC and UMAP dimensionality reduction.

Now I would like to analyze an incoming live audio stream and perform the same kind of analysis (Novelty or onset segmentation, MFCC computation, Statistical analysis and UMAP mapping) so that I can map the current timbre of the live piano on the 3D space created by the mapping learned on the corpus.

Initially, I was planning to use the signal in/out tools instead of the buffer in/out ones. But I realized that the fluid.bufstats~ and the fluid.stats~ do not perform the same kind of analysis. So, I am wondering if the correct approach is to somehow record the live incoming audio into a buffer, generate the slices in real-time and then perform the analysis with the buffer tools. Is this the appropriate approach?

Anyway, I am not sure whether someone has already worked on this kind of approach (learning a mapping with the corpus explorer and using it for real-time audio). Any advice on how to tackle this will more than welcome. Thanks in advance!

Hello and welcome!

What you are interested in is part of the large family of ‘audio query’ and there are various ways of doing it. It is very exciting and I do it all the time as I get very interesting results.

This is definitely one way to do it. I like to call it the Just-In-Time approach and there is an example on drums in the example folder of the package.

A simpler approach is to use the first 2 channels of fluid.bufstats (mean and standard deviation) as to match the fluid.stats output (which has both). We have some patches that do this too, and a tutorial coming online hopefully in the next month or two.

One important thing: MFCC are good for timbral diversity, but if you use them on a narrow timbral space (piano is quite consistent) then trying a higher number of coefficients (13 becomes 20 for instance) could help too.

Finally, there is a more involved way to do it, which is explained in the article on @tutschku’s piece written by @jacob.hart here: Learn FluCoMa

===
So that is a lot of info :slight_smile: What I recommend is to check the JIT example (JIT-NMF-Classifier) in the folder, and start a new thread if you have questions on it. I’ll reply quickly I promise :slight_smile:

I hope you have fun!

1 Like

Thank you very much for these hints!
I will work on them.

1 Like