Ways to test the validity/usefulness/salience of your data

For the most part my corpora haven’t been more than 1-3k samples or so (for most of the use cases it’s been longer “one shot” style samples, rather than individual grains, which can get in the 100k+ territory in C-C-Combine). So I guess with high dimensionality there’d be dubious (in terms of speed) benefit of a pre-fit space.

A native (e.g. fluid.dataset~-based) brute force search object would be handy for use cases like that. It’d probably still harp on about having some kind of interface to bias/fork etc… though…

Just building each analysis/recipe saps most of the energy/life out of me due to the sheer number of objects, error-prone-ness, and then when mirroring offline/realtime analyses, making sure all the fit files are stored/lineup etc…

It’s also kind of hard to gauge how effective something is if it’s returning a match each time, and it’s relatively disparate source/targets. Perhaps just doing assessable matching (e.g. the “time travel” idea I was on about before) as there I can point to recipes and say “yes, this one works better”. That died off for similar reasons as wanting to change any bit of the analysis (lower order MFCCs in that case) meant revamping so so much.