Lopsided PCA plots

How does one deal with this kind of PCA? Two jerks are ruining everything.

Screen Shot 2020-08-25 at 2.45.01 PM

PCAs dimensions are standard deviations so you must have some real Banditos in there.

You have three options.

Find Banditos and send them to jail.

Use a form of preprocessing or scaling that helps to mitigate outliers, I would recommend a robust scaler that centres on the median and scales to the inter-quantile range. [1]

Transform the projection to be more evenly distributed. Not sure how easy this is without a tonne of iteration and unfun times considering you are locked into the dataset paradigm.

[1] http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.robust_scale.html

Also a good tech report on it here


The Donald always gives the best coding advice.



Like @jamesbradbury says, either zap the outliers or use something that’s more robust to them than standardizing or min-max normalization. BufStats will give you the information you need to do robust scaling, but it’s not completely labour-free at the moment.