Transform in place for scaling

jamesbradbury · July 21, 2020, 9:50am

Is there an application of normalize and standardise that happens outside of a dataset~ object? I think that it would be so much easier to just call these as methods on part of the fluid.dataset~ object rather than having to do a copy of the data to a new place. It becomes a bit gymnastic-al keeping tabs on where everything is passed particularly if you have multiple stages of preprocessing before you train an mlp for example. I tend to go with a naming structure like input, input.norm, input.norm.std just to keep tabs on where everything goes but for every stage of this I need to do more objects and have to route the outputs to get an automatic one button pipeline going.

Instead it could be nice that you just call these methods directly on the object or you can copy to a new dataset if you are going to transform new inputs based on the fit of the norm/std process. Just my 2c

rodrigo.constanzo · July 21, 2020, 9:55am

You can always do fittransform datasetName datasetName and it will do it “in place”. @weefuzzy warned in another thread that at the moment that may not be perfectly safe (or that might have been in reference to the buffer~ equivalent.

But in general, interface-wise, I prefer doing ‘in place’ things, as the plumbing required for having individual fluid.dataset~s and buffer~s for every step of the process is my least favorite part of programming (particularly for getting all the size things right for @blocking 2 stuff).

jamesbradbury · July 21, 2020, 9:56am

yeah I’ve been doing the in place method as I never need to go back to the unnormalised data.

rodrigo.constanzo · July 21, 2020, 9:58am

An interface tweak that would be really nice would be that if you only specify an input fluid.dataset~ it just does it ‘in place’ by default.

So fittransform datasetName is all you need, and you don’t have to double up the messaging.