IQR-ing corpora

I did watch the video and it helped clarify how that can be better for misshapen (non gaussian) datasets.

I remember going through that patch a bit ago, though it would be worthwhile revisiting with IQR at hand, as I struggled pulling out the most useful information there before for reasons as you’ve outlined here.

I guess, in general, it’s still useful knowing “best practice” stuff as training wheels of sort, as it’s (near) impossible trying to learn anything where the answer is “become a data scientist”. Or to use a more concrete metaphor (which I may have mentioned on a Thursday geekout) where I’m trying to learn to tune a guitar, and we end up talking about 8 string early instrument intonation approaches as applied to contemporary classical music interpretation etc… Maybe it’s useful to know that the first string should be tuned to E, then build from there.