Selecting deimensions when doing dimensionality reducction

rodrigo.constanzo · August 3, 2023, 12:14pm

What did you use for the dimensionality reduction (PCA, UMAP)?

In either case, the reduction is done in a manner that none of the original dimensions are present any more. Meaning that it’s not an “arbitrary order” of the existing ones, but rather a completely new representation. In the case of PCA the first of those dimensions will be the dimension with the most variance, and the second will be the 2nd next amount of variance etc… With UMAP I’m not entirely sure how the dimensional order ir done, but I do believe it is random(ly seeded).

There is a way to determine which dimensions (of your original ones) carry the most variance (a good thread on SVM here) but that may not be ideal for using a 2d plot.

The tags (labels?) can remain attached to each data point regardless of how many dimensions there are. As in, if your first data point (“0” or whatever) has 13d of MFCC data, or 2d of PCA data, it still is point “0”, and if you have a corresponding label (from fluid.labelset~) for “0”, then you can show that just the same.