IQR-ing corpora

Ok, revisiting this today.

So first the noob-y question. Does IQR-ification happen inside fluid.bufnormalize~? (e.g. @min 0.25 @max 0.75?) According to the helpfile that’s the output range, and if I’m understanding correctly, you want the input to be used to compute the IQR-ing.

Now, only a cool thing that @tedmoore suggested ages ago, that I only finally got around to now. For my Time Travel stuff, Ted suggested a sanity check to see if the descriptors from 256 samples are even in the same ballpark as what you get from 4410, or more specifically, if you can (somewhat) accurately predict a 4410 window with only 256 samples (given a finite, and predefined set of inputs).

I had been putting this off as I had no (easy) way to visualize stuff, but that’s sorted now.

Before I get on to the question(s) stuff, here are the results of my first test with this.

This is feeding the same audio into the same process (I think (more on this below)) and then plotting them in the same reduced (umap) space.

The results look pretty good actually. Not perfect, but not completely incompatible.

So my process here was to take a 42d space (20MFCCs(19), loudness, pitch, with mean/std of everything) then standardize, umap, normalize, then plot.

My question is with regards to doing a workflow like this to see how things overlap in an absolute sense.

What I did here was run that process (standardize->umap->normalize) on the 4410 dataset, then write the fits for those three objects/processes to disk, then load up the 256 sample version, read all those .jsons, then transform (instead of my initial fittransform).

Is that correct?

And as a follow up. If I understand @weefuzzy’s previous posts correctly, other than figuring out where the IQR stuff fits in, if I wanted to force the overlap between these descriptor spaces I would independently standardize/umap/normalize? Or would I keep the same umap-ing, so things kind of relate?

(as an aside, my intended workflow at the moment is to not use UMAP/PCA at all in the processing, other than for visualization, but still want to wrap my head around this side of things)