FluidDataSetQuery transform column order

tedmoore · January 6, 2021, 6:05am

Greetings,

Just an observation that tripped me up for a second. When using FluidDataSetQuery, the order in which the user adds columns using .addColumn is not the order they are in after .transform they’re still in the original, just the others are removed.

This is fine this way, but it might be good to state explicitly somewhere (I did, perhaps naively expect them to be in the order I added them in).

tremblap · January 6, 2021, 9:40am

actually, reordering with them might be a possible feature request if that makes sense to you but it’ll need to be discussed with the guys. Just riff on why it is a good idea for now, and that’ll get us thinking

tedmoore · January 6, 2021, 6:49pm

@tremblap, here’s my broader thinking / context around this.

One thing I’ve started doing is, when making a dataset and saving it to disk, I’ve been adding a csv file next to it in the same folder with the “column headers” so that later I can know what’s in that dataset if I were to come back to it later (I also have been adding a similar file with “source file paths”).

Usually these datasets are quite large, “analyze everything” approaches and then I can decide what features I might want to use later–using FluidDataSetQuery to select columns (and I know what they are based on my csv file of headers). In order to do this, I actually use the csv file to do a little lookup to see what index in the original dataset corresponds to a descriptor name string. This way filtering my dataset for the columns I want looks more like (similar to AudioGuide, @b.hackbarth):

~descriptors = [ "specCent-mean","specFlat-mean","pitch-median","pitchConf-mean","loud-deriv-mean","mfcc01","mfcc02" ... ];

And then some code will iterate over this array and use FluidDataSetQuery to give me the dataset subset I’m asking for.

As you can see if I ask for the descriptors in a certain order in this array, my brain assumes that the dataset subset would be in that order. And now that I think about it and type it out, I do think it would be better to make the FluidDataSetQuery return the .transform dataset with the columns in the order the user adds their indices with .addColumn because that way, something like this will make more sense, but also, it allows a user to reorganize columns, which will probably be useful.

I’ll try to clean up my header-file-writing-code and put it in the Code Sharing page soon as others may find it useful. The next week is quite busy for me, so it might be a bit.

tremblap · January 6, 2021, 7:40pm

Thanks it makes sense. Let us think about it now.