Robustscale a slice against a larger dataset

muddywires · March 9, 2023, 5:56pm

Is it possible to use robustscale/standardize/normalize on a buffer but have it perform the scaling based off of a larger dataset?

I am trying to train a timbre classifier with a stream of mfcc data analyzed from ampslices. my thought was that it would perform better if each training slice sent to the knnclassifier~ was first scaled within a range from the entirety of possible input sounds, as opposed to applying robustscaling to each individual training sample.

Is there a convenient way to tell fluid.robustscale~ to fittransform dataset A against dataset B and write it to dataset C?

muddywires · March 9, 2023, 6:04pm

the workaround I’m otherwise considering is to concatenate each training and inference slice to the entire dataset, applying robust scaling, then slicing off the scaled slice before sending it to the classifier.

tremblap · March 9, 2023, 6:11pm

if you fit any scaler on dsB, then use the transform dsA dsC it will do exactly that, if I understand your question right

muddywires · March 9, 2023, 6:19pm

Ah yes that’s it. I was getting confused about the various message types. Thank you