Below is a patch that lets you explore a corpus by analysing it with audio features and mapping those features onto a two-dimensional visual map. You can scrub through the space using your mouse
Itâs a little bit rough but I hope the patch makes sense. Start by dropping in a folder of sounds and it should automatically do something. You can then configure the parameters of UMAP and the features to experiment further.
have the umap parameter (numneighbours, mindist) defaults show up as I had no clue what the values were to then adjust from
add polyphony to the audio playback as it sounds like a terrible machine gun while browsing samples
remove the @loop attribute as thereâs no way to stop playback
It would be great to combine this with the (semi)oldschool UI for selecting what parameters/stats/etc⊠rather than looking at a single feature.
This is more of a broader question. The use case here is âdragging a folder of individual samplesâ but then the internal workflow concatenates everything and then everything after that is managed via offsets. I know @tremblap quite likes this workflow, and it does make a ton of sense if you are starting from a single long file and then segmenting as part of the process, but it seems like this would create a load of friction further downstream if you want to adjust parameters, remove individual samples, tweak overlaps, etcâŠ
More specifically, what is gained by automatically concatenating everything into a single monolithic file? (to offset the procedural confusion it causes)
Iâll try and be brief to avoid derailing the thread (we can always move is this erupts). In short, I think itâs swings and roundabouts. The obvious alternative in Max would be to use a polybuffer~. One still then needs to keep housekeeping data (buffer numbers now, rather than offsets; or as well as offsets if youâre also segmenting the individual files).
As Owen said, the takeaway is swings and roundabouts. Personally, I prefer to chuck stuff in a polybuffer~ and go from there having had my audio sliced already and the audio files themselves carrying the meaningful information about what the start/end of a segment is. This patch emerged from a workshop with absolute beginners where many of them were not familiar with buffers~ let alone polybuffers~ and I opted to show them one way that was relatively anything in segments out. I should also add the context that this isnât meant to be the âcanonical FluCoMa corpus exploration patchâ. Itâs one instance of what one might do with the tools, and a possible arrangement of objects toward a singular method of decomposition.
fascist
Again, a consequence of teaching it is to have sounds repeatedly play so that you can compare sounds and talk over them while they play. Chuck an attrui on if you wanna stop it easily.
Thatâs my woopsies and should be fixed.
Iâd be curious to see how the patch fares with some one-shot drum samples
Yeah, thatâs pretty tangential, but I personally find the offsets really confusing and unintuitive, particularly if you start pruning out samples as part of the process.
I just wasnât sure if there was some memory optimization that meant a single chonky buffer was better than individual polybuffers or something like that.
The other bits are definitely simple/easy fixes, was just pointing out things on first play with the patch.
Also the numbers get really big which makes it hard to parse from a cursory glance IMO. Weâre on the same page here
Yeah nah, nothing fancy like that. I could whip up a polybuffer~ example of how I would do it (with @blocking 0) if that would be helpful to show the other side too?
This is all good banter, my friends, but letâs not forget this is a beginnerâs patch. And that you are friends (I donât think everyone knows how you two know each other )
The feedback is good as it will help us hone in on what the rest of this series of beginning patches will include (the 101s) until we get to design the intermediates (201s) and advanced (301s).
Given that visualization is the focus here, I would definitely make the LCD larger (double it in size for instance). Some of the colors donât show up very well - especially the yellow. Might be nice to use a perceptually uniform colormap (which you could grab from somewhere like matplotlib).
Finally, I agree with Rod that this patch would greatly benefit from letting the user work with UMAP or letting them pick plain olâ descriptors to plot. For instance, X=Loudness, Y=Zerocrossings. Or, more interesting: X=loudness, Y=1-D UMAP MFCCs. The reason being: you can learn a lot about which descriptors are interesting through visualization and browsing. However, I concede that this is more complicated from a UI point of view. Always a delicate balance.
YESSSS I now have an official reason to not like it It has advantages (dynamic allocation and batch handling) but I had an irrational hatred for it - now you vindicated me
seriously though, a lot of our examples use a single long buffer which works well in other contexts. As for the crashs, @weefuzzy is a M4L powerabuser so he might have a trick or two to shareâŠ
First of all, thanks to the Flucoma team for these wonderful tools!
I am working on a piano timbre analysis patch in MAX and I am seeking some advice. I have been experimenting with the âvisual corpus explorationâ patch and followed the video tutorial series on âcreating a 2D corpus explorerâ. Starting from a corpus of piano recordings, I have created a 3D space representation of the timbre based on MFCC and UMAP dimensionality reduction.
Now I would like to analyze an incoming live audio stream and perform the same kind of analysis (Novelty or onset segmentation, MFCC computation, Statistical analysis and UMAP mapping) so that I can map the current timbre of the live piano on the 3D space created by the mapping learned on the corpus.
Initially, I was planning to use the signal in/out tools instead of the buffer in/out ones. But I realized that the fluid.bufstats~ and the fluid.stats~ do not perform the same kind of analysis. So, I am wondering if the correct approach is to somehow record the live incoming audio into a buffer, generate the slices in real-time and then perform the analysis with the buffer tools. Is this the appropriate approach?
Anyway, I am not sure whether someone has already worked on this kind of approach (learning a mapping with the corpus explorer and using it for real-time audio). Any advice on how to tackle this will more than welcome. Thanks in advance!
What you are interested in is part of the large family of âaudio queryâ and there are various ways of doing it. It is very exciting and I do it all the time as I get very interesting results.
This is definitely one way to do it. I like to call it the Just-In-Time approach and there is an example on drums in the example folder of the package.
A simpler approach is to use the first 2 channels of fluid.bufstats (mean and standard deviation) as to match the fluid.stats output (which has both). We have some patches that do this too, and a tutorial coming online hopefully in the next month or two.
One important thing: MFCC are good for timbral diversity, but if you use them on a narrow timbral space (piano is quite consistent) then trying a higher number of coefficients (13 becomes 20 for instance) could help too.
Finally, there is a more involved way to do it, which is explained in the article on @tutschkuâs piece written by @jacob.hart here: Learn FluCoMa
===
So that is a lot of info What I recommend is to check the JIT example (JIT-NMF-Classifier) in the folder, and start a new thread if you have questions on it. Iâll reply quickly I promise