How does one deal with this kind of PCA? Two jerks are ruining everything.
PCAs dimensions are standard deviations so you must have some real Banditos in there.
You have three options.
Find Banditos and send them to jail.
Use a form of preprocessing or scaling that helps to mitigate outliers, I would recommend a robust scaler that centres on the median and scales to the inter-quantile range. [1]
Transform the projection to be more evenly distributed. Not sure how easy this is without a tonne of iteration and unfun times considering you are locked into the dataset paradigm.
[1] http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.robust_scale.html
Also a good tech report on it here
http://scikit-learn.org/stable/auto_examples/preprocessing/plot_all_scaling.html
The Donald always gives the best coding advice.
Sam
Like @jamesbradbury says, either zap the outliers or use something that’s more robust to them than standardizing or min-max normalization. BufStats
will give you the information you need to do robust scaling, but it’s not completely labour-free at the moment.