Ways to test the validity/usefulness/salience of your data

rodrigo.constanzo · February 17, 2022, 1:23pm

Definitely. I started working out a thing like this ages ago to have a meta/macro dict to keep track of columns, as well as fit data and other metadata. And even suggested having some native way for the objects to output something similar, but it’s all very brittle and fragile if you want to change any single aspect of stuff (say, less MFCCs, different fft settings, larger analysis window, etc…).

It’s been a bit since I do this in practice, but doesn’t the final step of the PCA/SVM-ish thing give you a list of indices (e.g. 4 6 18 21 22 30 48 61), then in order to build a fluid.dataset~ from that (to feed UMAP) you need to go and fetch those individual columns. I guess there’s perhaps some zl-land munging that can happen there, and since it doesn’t really matter what these columns represent anymore, they could be moved without labels or whatever.

It just seems like a useful process that would benefit from some kind of native-ish solution like telling fluid.umap~:
weights pca.output 90(%), fittransform source.dataset target.dataset

Hmm. I could be remembering wrong, but doesn’t PCA return the same results no matter how many components you ask for? Like the first 5 components of a PCA5 are the same as a PCA500. And that the dump output of PCA contains the whole matrix either way.

Yeah that’s interesting. Again, doable by poking and vibing it, it just gets tricky if you’re trying to orient towards a position that satisfies more than one criteria. Or using that as a jumping off point.