as the draft name suggests I want to slice and analyse more than a single audiofiles folder and store the analysis results so that I can restore it - instead of redoing the analysis all the time which takes quite some time. anyone here who has experience with and can elaborate on it?
my idea is to load all included audio files folders in a master folder or to select multiple folders from different directories …
what are the limits and are there general recommendations when handling big bunches of audio and analysisdata?
audiofiles with same bit and samplerate won’t hurt I guess
This is kind of what fluid.dataset~ is there to help with: because the write and read messages let you save analyses as JSON files, you can build up a collection without having to redo work.
Definitely having uniformity of sample rates can be important; the word length not so much. You might want to think about how (if) you want to handle multichannel sources as well.
What’s probably easiest at the moment with dataset – if you know that you’ll have a bunch of different categories of sounds – is to keep some sub-sets that you can merge if needed. Currently it’s easier to combine datasets than it is to filter them.
Another thing to consider is a convention for the data point IDs: using the source filename in the ID can be really useful for being able to get back to the source audio (e.g. for playback).
thanks for your detailed and helpfull reply @weefuzzy !
so i would keep analysis data seperate if i would like to treat/process lets say
analysed animal sounds differently than car engines– instead of merge the whole datasasets right from the beginning. that makes sense.
regarding the datapoint id i just realized that i have to lean more about that and definitely also about what kind of dataset processing is possible beside merging.
what i am curious is if there is already some abstraction for loading/slicing/analysing multiple audio folders just to get started and expand on it… maybe in some tutorial/helppatcher?
sure, thanks for pointing out.
in your patch you use a polybuffer to store audiofiles and address them separately. can polybuffer also handle a whole library of with lets say 50000 or even more audio files this way…? i ask because i assume that loading audio into a max buffer or polybuffer (from internal or external hd) basically loads audio into ram. so the number of files is limited to size of my ram?
back to the original question:
how would this “load folders in a folder” thing work?
i guess i need to ask for all folder items in a directory and store their name/path.
after that is known i can iterate their path and load the contained files (more ore less your patcher) one after another, right?
The limits will be a function of your RAM but also your free disk space (as once RAM is exhausted, the OS will swap out to disk). I don’t know if there’s a count-limit for polybuffer~ (which would depend on the size of the integer they use to index), but I’m pretty confident it would be > 50k.
So, that’s maybe an issue of recursively walking a folder tree, and the complexity depends largely on how complex the folder structure is. If you only need one level of nesting, then it might suffice to use something like dropfile with its filetype set to FOLD (iirc) to get a list of subfolders that you can then use with the readfolder message to polybuffer~ to load all the audio files in that particular directory. If it’s more complex, then @a.harker 's AHarker_Externals package has a recursivefolder external that will maybe help:
in case anyone here is interested or wants to use it …
here is my expanded version of @weefuzzy s demo patcher that now does the load folders in a folder thing using @a.harker s recursivefolder object. I just tested it with a ca 15 gb audio archive, ca 25000 wav files, 4 levels deep. all files properly loaded into the poly buffer… next week I will test it with bigger archives and see how it works. prost!
seems that this patch doesn’t work here (mbp m1 2021 - monterey)
yes it fills the the text object but not with files of multiple folders. also the open dialog just eats single files?
something related. I know that some here tend to concatenate multiple
files in a single buffer and store slices into another single buffer. @jamesbradbury corpus explorer is based on that idea. is there a specific reason to not use polybuffers instead- especially when loading a huge amount of sound files?
I made a mistake and always used the path included. Change [opendialog fold] and it works like a charm
yes and no. It all depends what you want to do after with the sounds. The one-buffer approach has advantages, the polybuffer approach others. In my case, I use the former in Max because I like in these moments to segment across the boundaries of sounds. I use the latter in SC because I use it more as a ‘sampler’. you can do both in both it is really flexible. This is why we have the identifyier as a string, you can encode however you want what the numbers refer to…