Dimensionality reduction (mixing the rational and irrational)

rodrigo.constanzo · June 29, 2020, 11:10am

I’ll try PCA and see how that fares. I think if I’m just in millisecond land, it would (intuitively) give results that made sense that way. If I include derivatives that would probably crumble without sanitization.

Curious about this PCA->non-linear workflow too. By this do you mean do PCA on something to bring it down to a smaller amount of dimensions and then to do MDS on that lower dimensional version?

I’m not really clear on things either, but say I have one file with a duration of 5000 and a time centroid of 2500, and another file with a duration of 400 and a time centroid of 200. By not sanitizing the data, I was hoping to maintain that difference in scale, rather than them having somewhat similar values (?) after sanitization.

More practically, I want the fact that the first sample has bigger values to be present in the ‘timeness’ metric that I can query later on, if, for example, I want to choose samples that are ‘timeness’-ier (yikes!).

In spirit I guess, but I’m still struggling with my normal use cases where I can find the nearest match for certain fields, and then some other criteria for other fields (ala biasing).