I’ll try PCA and see how that fares. I think if I’m just in millisecond land, it would (intuitively) give results that made sense that way. If I include derivatives that would probably crumble without sanitization.
Curious about this PCA->non-linear workflow too. By this do you mean do PCA on something to bring it down to a smaller amount of dimensions and then to do MDS on that lower dimensional version?
I’m not really clear on things either, but say I have one file with a duration of 5000 and a time centroid of 2500, and another file with a duration of 400 and a time centroid of 200. By not sanitizing the data, I was hoping to maintain that difference in scale, rather than them having somewhat similar values (?) after sanitization.
More practically, I want the fact that the first sample has bigger values to be present in the ‘timeness’ metric that I can query later on, if, for example, I want to choose samples that are ‘timeness’-ier (yikes!).
They’re not so different!
In spirit I guess, but I’m still struggling with my normal use cases where I can find the nearest match for certain fields, and then some other criteria for other fields (ala biasing).