I’m trying to build something along the lines of @jamesbradbury 's excellent Max tutorials on 2d corpus exploring, but in SuperCollider. I am creating slices using FluidBufNoveltySlice and then i want to take MFCC’s for every slice. I’m getting stuck on what the best approach is to this is in SC - FluidBufMFCC or FluidMFCC. If the former it seems to my understanding as if I have to create a features buffer for every slice rather load all the data into a single buffer (as done in the Max example)? Is it better to use the latter approach and run it in control rate as function.play? I currently have a short file so that’s no problem but it might be preferable to do the ‘offline’ Buf approach if i have more source audio.
It is from one of our workshops and well commetned (again thanks to @tedmoore).
This should show you the best way to take the MFCCs per slice by specifying the numframes and startframe inside a loop that iterates the slices by pairs.
Thanks, @tedmoore, it works really well and does exactly what i wanted. One thing that still puzzles me somewhat is why (if I understand this correctly!) we are running the Umap 2D reduction on the stats (from FluidBufStats). It’s not entirely clear to me why we need the Stats. Is it not a good idea to run the Umap reduction directly on the MFCC’s? I know @jamesbradury explained this in his video, but I’m not sure I get it still!
Good question. The reason BufStats is used is because all of the sound slices that get analyzed are probably going to be different lengths, so we’re going to get a different number of analysis frames (FFT frames) for each! (still 13 MFCC values for each analysis frame though). With a different amount data for each sound slice, it’s not clear how we would compare them to be able to find which ones are similar or different for plotting in the 2D space.
BufStats gives us a set of statistics (from which we use the mean) for each channel in the MFCC analysis buffer. This means that all of the MFCC1 values from a sound slice (which will be however many analysis frames there are) are used to compute the mean MFCC1 value across the whole slice. The same is done for all the MFCC values, so now each slice is described by 13 mean MFCC values. Now we can compare each of these sound slices because they’re each represented by 13 numbers.
So, we are running the UMAP on the MFCC values, but it’s on the 13 mean MFCC values for each slice–that way each slice is represented by the same amount of data.
Let me know if that answers the question, and any other questions that might pop up!
Aha, thanks @tedmoore, that makes perfect sense now. I obviously hadn’t fully understood how BufStats works. it looked to me as if we were reducing MFCC to 7 stats before running the umap, but i now get that it’s just getting an average of each channel.