Below is a patch that lets you explore a corpus by analysing it with audio features and mapping those features onto a two-dimensional visual map. You can scrub through the space using your mouse
It’s a little bit rough but I hope the patch makes sense. Start by dropping in a folder of sounds and it should automatically do something. You can then configure the parameters of UMAP and the features to experiment further.
have the umap parameter (numneighbours, mindist) defaults show up as I had no clue what the values were to then adjust from
add polyphony to the audio playback as it sounds like a terrible machine gun while browsing samples
remove the @loop attribute as there’s no way to stop playback
It would be great to combine this with the (semi)oldschool UI for selecting what parameters/stats/etc… rather than looking at a single feature.
This is more of a broader question. The use case here is “dragging a folder of individual samples” but then the internal workflow concatenates everything and then everything after that is managed via offsets. I know @tremblap quite likes this workflow, and it does make a ton of sense if you are starting from a single long file and then segmenting as part of the process, but it seems like this would create a load of friction further downstream if you want to adjust parameters, remove individual samples, tweak overlaps, etc…
More specifically, what is gained by automatically concatenating everything into a single monolithic file? (to offset the procedural confusion it causes)
I’ll try and be brief to avoid derailing the thread (we can always move is this erupts). In short, I think it’s swings and roundabouts. The obvious alternative in Max would be to use a polybuffer~. One still then needs to keep housekeeping data (buffer numbers now, rather than offsets; or as well as offsets if you’re also segmenting the individual files).
As Owen said, the takeaway is swings and roundabouts. Personally, I prefer to chuck stuff in a polybuffer~ and go from there having had my audio sliced already and the audio files themselves carrying the meaningful information about what the start/end of a segment is. This patch emerged from a workshop with absolute beginners where many of them were not familiar with buffers~ let alone polybuffers~ and I opted to show them one way that was relatively anything in segments out. I should also add the context that this isn’t meant to be the ‘canonical FluCoMa corpus exploration patch’. It’s one instance of what one might do with the tools, and a possible arrangement of objects toward a singular method of decomposition.
Again, a consequence of teaching it is to have sounds repeatedly play so that you can compare sounds and talk over them while they play. Chuck an attrui on if you wanna stop it easily.
That’s my woopsies and should be fixed.
I’d be curious to see how the patch fares with some one-shot drum samples
Given that visualization is the focus here, I would definitely make the LCD larger (double it in size for instance). Some of the colors don’t show up very well - especially the yellow. Might be nice to use a perceptually uniform colormap (which you could grab from somewhere like matplotlib).
Finally, I agree with Rod that this patch would greatly benefit from letting the user work with UMAP or letting them pick plain ol’ descriptors to plot. For instance, X=Loudness, Y=Zerocrossings. Or, more interesting: X=loudness, Y=1-D UMAP MFCCs. The reason being: you can learn a lot about which descriptors are interesting through visualization and browsing. However, I concede that this is more complicated from a UI point of view. Always a delicate balance.
First of all, thanks to the Flucoma team for these wonderful tools!
I am working on a piano timbre analysis patch in MAX and I am seeking some advice. I have been experimenting with the “visual corpus exploration” patch and followed the video tutorial series on “creating a 2D corpus explorer”. Starting from a corpus of piano recordings, I have created a 3D space representation of the timbre based on MFCC and UMAP dimensionality reduction.
Now I would like to analyze an incoming live audio stream and perform the same kind of analysis (Novelty or onset segmentation, MFCC computation, Statistical analysis and UMAP mapping) so that I can map the current timbre of the live piano on the 3D space created by the mapping learned on the corpus.
Initially, I was planning to use the signal in/out tools instead of the buffer in/out ones. But I realized that the fluid.bufstats~ and the fluid.stats~ do not perform the same kind of analysis. So, I am wondering if the correct approach is to somehow record the live incoming audio into a buffer, generate the slices in real-time and then perform the analysis with the buffer tools. Is this the appropriate approach?
Anyway, I am not sure whether someone has already worked on this kind of approach (learning a mapping with the corpus explorer and using it for real-time audio). Any advice on how to tackle this will more than welcome. Thanks in advance!
What you are interested in is part of the large family of ‘audio query’ and there are various ways of doing it. It is very exciting and I do it all the time as I get very interesting results.
This is definitely one way to do it. I like to call it the Just-In-Time approach and there is an example on drums in the example folder of the package.
A simpler approach is to use the first 2 channels of fluid.bufstats (mean and standard deviation) as to match the fluid.stats output (which has both). We have some patches that do this too, and a tutorial coming online hopefully in the next month or two.
One important thing: MFCC are good for timbral diversity, but if you use them on a narrow timbral space (piano is quite consistent) then trying a higher number of coefficients (13 becomes 20 for instance) could help too.