A fluid.datasetfilter~ object for conditional querying

After the discussion today (great to re-geek after a long break), I got to thinking about some more use cases for something like this.

Or rather, highlighting ways I presently use entrymatcher and would like to use in fluid.stuff~, as well as ways that would be great to use in the future.

1 - “continuous” filtering of a dataset

This is the most immediate use case, as I’m presently doing stuff like this now. Taking any descriptor (say loudness, or it can be a combination of multiple time-scale envelope followers etc…) and use that to scale or filter a query based on that.

This is something @tremblap suggested ages ago, where depending on how long/busy/whatever I’m playing, I filter the query accordingly. So if I’m playing loudly (as in, either this individual analysis frame has a high loudness and/or a slower envelope follower is above a threshold), I want to select only entries that then meet some criteria. Say, duration > 500 or centroid < 50, etc…

The point here being that once you have more than a single parameter on the input and/or that you want to filter by, it becomes impractical to create individual fluid.dataset~s for each of these. Not to mention a reduction in resolution if you have a finite amount of combinations.

2 - manipulation of the queryable space

This may be solved with the radius feature for fluid.kdtree~, but based on some of the ideas in the corporeal morphology thread, being able to dynamically change the pool of available entries that are queried from would be a super powerful thing.

Radius could kind of tackle that, but for unevenly distributed data points, it could mean the difference from getting no matchers, to getting all the matches. Rather than using something like % or whatever. Perhaps we will have some algorithms that can evenly space the data out, but as far as I know that’s not presently the case(?).

So here, it would be something like what @spluta is doing with his joystick. Where he could potentially move around and scale the available points which the incoming audio will then query from.

3 - conditional querying based on non-descriptor-space data points

This is similar to #1 but I guess the main difference being the ability to have points in a fluid.dataset~ that can be excluded from direct querying.

So this could perhaps happen using fluid.dataset~ query, where you create versions of each fluid.dataset~ for kdtree-able data and stuff for meta-data purposes, but this starts to get really clunky real quick, and you end up with “the buffer problem”, where you now have dozens of fluid.dataset~s you need to manage.

A use case here would be the conditional pitch confidence biasing that @tremblap mentioned. If your confidence is above (or below) a certain point, use these data points, or those data points. Again, possible with two fluid.dataset~s (for that particular example), but if you start making a few more conditionals, or nested conditionals, the interface starts becoming a messy spiderweb of unpleasant data management.

For something like this, I picture something where at the querying stage, you can specify what columns (or whatever) you want to search from, as well as just having fields that are contextually queryable but not distance queryable (things like duration, or binary flags (e.g. “this is an attack”)), etc…).

4 - variable weighting between time series

This relates to the LPT idea, as well as the “time travel” stuff I’ve been working on, where you may have a fluid.dataset~ that contains an attack, the-bit-after-the-attack, and the sustain (or whatever). In LPT all of these are weighted equally, but it’s not difficult to picture a situation where you may care more about the sharp attack and want to bias that in your query.

Or in the case of the time travel idea, I have a real set of descriptors for samples 0-256, and then a predicted set of descriptors for samples 257-4410. I want to take the latter into consideration, but I don’t want it to have the same weight as the real descriptors.

Now, from what @tremblap hinted at. It will soon be possible to create buffer~s which you can scale and transform, which is fantastic. So this could potentially solve this problem for static version of this where I can just apply a scaling to each buffer~ before it’s passed to a fluid.dataset~, and can even do it dynamically for new incoming points. But what I can’t do is apply this to a fixed fluid.dataset~ (as far as I was able to tell from @tremblap’s description).

Say, I have a huge dataset of samples that are all preanalyzed and I want to be able to bias the query to put more emphasis on pitch, or MFCCs, or loudness.

//////////////////////////////////////////////////////////////////////////////////////////////////////////////////////

So all of that is to say that some of these “query” things that have been possible for ages with entrymatcher and coll even are super powerful, so it would be a shame to have to pick between doing complex querying or having complex structuring (via fluid.stuff~).