I have a dataset with 160 dimensions and want to reduce it with pca while keeping the
fraction of total variance to 0.95.
But I’m running into a problem when the number of rows is smaller than the number of dimensions. In that case, fittransform fills the new dataset with zeros.
I’m not sure if this is a bug or me not getting the concept. It is the first time that I’m working with such a large amount of dimensions.
The original dataset with 99 rows and 160 cols
fluid.dataset~: DataSet pca.std:
rows: 99 cols: 160
000001 0.29054 -0.33205 -0.14171 … 0.58806-0.0076716 0.14521
000002 0.35523 -0.55267 0.10501 … -1.0461 0.51816 0.44641
000003 2.7544 -2.5785 -2.6737 … -0.1726 1.1397 0.2607
…
000097 -0.18875 -0.64774 -0.75295 … -0.4076 3.6126 0.34052
000098 -0.62454 1.2527 1.3371 … 0.013334 -0.90388 -1.3311
000099 -0.37633 0.94823 0.46386 … 0.69903 -0.44015 0.80934
If I limit the number of dimensions to 99, the reduced table contains values:
PCA.dimensions.for.95%: 76
fluid.dataset~: DataSet pca.reduced:
rows: 99 cols: 99
000001 4.5537 -1.2961 3.6897 … -0.07921 0.0310326.2655e-16
000002 7.9774 -1.2558 -1.6491 … 0.046491 -0.22899-3.8173e-15
000003 3.9425 -14.258 -3.059 …0.00081779 0.031055-5.6719e-15
…
000097 -7.3595 -13.407 -4.8174 … 0.049414 0.00347621.9137e-15
000098 -0.28963 4.7732 -3.8851 … -0.062684 -0.0117-9.7668e-16
000099 -3.3875 5.8917 1.93 … -0.18516 -0.030852.4009e-15
PCA.dimensions.for.95%: 76
BUT IF I’M ASKING FOR MORE DIMENSIONS THAN THE NUMBER OF ROWS
the resulting reduced dataset has just zeros
PCA.dimensions.for.95%: 76
fluid.dataset~: DataSet pca.reduced:
rows: 99 cols: 100
000001 0 0 0 … 0 0 0
000002 0 0 0 … 0 0 0
000003 0 0 0 … 0 0 0
…
000097 0 0 0 … 0 0 0
000098 0 0 0 … 0 0 0
000099 0 0 0 … 0 0 0
Anybody with a brilliant explanation/solution?
Thanks, Hans