IQR-ing corpora

rodrigo.constanzo · March 5, 2021, 11:36am

Revisiting this workflow now as I’m presently building some test/comparison patches.

At the moment the end results of this process will go into a fluid.kdtree~. In looking back at this there is an additional normalization step after the PCA-ing. Is this typically needed? I guess the output of PCA can be a bit “all over the place” (though it would be good to see the results of that numerically).

So far I’ve built a vanilla workflow that takes new incoming points and applies robust scaling, PCA, and normalization before being fed into knearest.

I’m going to try building similar versions with MLP instead, as well as one that does no PCA-ing at all (just using smaller initial dimensions, ala the SVM/PCA stuff).