@tremblap, here’s my broader thinking / context around this.
One thing I’ve started doing is, when making a dataset and saving it to disk, I’ve been adding a csv file next to it in the same folder with the “column headers” so that later I can know what’s in that dataset if I were to come back to it later (I also have been adding a similar file with “source file paths”).
Usually these datasets are quite large, “analyze everything” approaches and then I can decide what features I might want to use later–using
FluidDataSetQuery to select columns (and I know what they are based on my csv file of headers). In order to do this, I actually use the csv file to do a little lookup to see what index in the original dataset corresponds to a descriptor name string. This way filtering my dataset for the columns I want looks more like (similar to AudioGuide, @b.hackbarth):
~descriptors = [ "specCent-mean","specFlat-mean","pitch-median","pitchConf-mean","loud-deriv-mean","mfcc01","mfcc02" ... ];
And then some code will iterate over this array and use
FluidDataSetQuery to give me the dataset subset I’m asking for.
As you can see if I ask for the descriptors in a certain order in this array, my brain assumes that the dataset subset would be in that order. And now that I think about it and type it out, I do think it would be better to make the
FluidDataSetQuery return the
.transform dataset with the columns in the order the user adds their indices with
.addColumn because that way, something like this will make more sense, but also, it allows a user to reorganize columns, which will probably be useful.
I’ll try to clean up my header-file-writing-code and put it in the Code Sharing page soon as others may find it useful. The next week is quite busy for me, so it might be a bit.