New Object Early Community Build #3 and 4-5-6: FluidDataSeries, and DTW-based classification and regression - NOW IN SC TOO

tremblap · September 14, 2023, 2:36am

OK this one is a big one - actually a new concept, FluidDataSeries, which would be a dataset of time series, and 2 first ways of dealing with them, mostly finding distances (DTW) and therefore being able to find nearest neighbour, and therefore being able to find classes and interpolated values.

Now, please follow the (very ugly and terse and verbose and messy) tutorials in order to understand them. At the moment, it is brute-forcing so it won’t work well with huge data series either. It is very early release so we can see where that can go in the next months. Not the next release

I don’t need to be told the patches are ugly But if you stall despite reading the text and following the numbers, that, I need to know, so I can try to devise some sort of learning material. or if you find an interface discrepancy vs your usual fluid-coding-style.

update:
~~Max only, MacOS fat and PC thought. SC should be soon in, probably next week or earlier if I have insomnia on the plane~~
SC is now available too, see below.

p

original download link redacted-see below for an updated version

rodrigo.constanzo · September 14, 2023, 1:22pm

This seems very cool!

I can see the distance querying being very useful in a lot of my use cases since the contour/gesture is often the most relevant bit, rather than the specific descriptors.

For the classifier/regressor bits, I tried messing with it a bit but couldn’t exactly make sense of the logic/routing, but is it possible to classify/regress as you go? As in, while drawing a shape, have constant output, potentially with some kind of confidence/distance metric?

I found myself wanting to see the regressor wiggle around as I drew the input, rather than waiting for me to finish (handwriting detection style).

Obviously a better comparison(/classification/regression) can be made once the whole time series is complete, but it can match portions of a gesture as it goes?

re legibility:
As is my way of understanding things, I found patches 2/3 way more indicative of what was happening vs patch1. Particularly all the bits having steps 1/2/3/4/5/6 being loops and all over the screen, with some steps having you do something 5 times and other ones just one press. It was very easy to get “lost” in just the communication.

tremblap · September 14, 2023, 1:48pm

be aware that now it is brute force, so large corpora might be a problem. @lewardo had plans of doing fast dtw and might still be able to do it if I manage to convince him of the benefits of volunteering (or if you bribe him)

LSTM is on its way for that.

you could implement a running gesture query - this is just one example. dataseries allow you to add and remove items at various positions, so you could have a rolling thing. I plan to code one soon-ish as demo

thanks - it is hacky indeed. but did the comparison with dataset (in 1) and with knnreg/class in 3/2 helped to understand?

rodrigo.constanzo · September 14, 2023, 2:15pm

I’ve not tested anything with this, but I remember back when doing entrymatcher vs fluid.kdtree~ comparisons (thread) the bruteforce was still quite performant until things got really really big (with the KDTree being faster only above a certain size). (actually revisiting that thread brute force was faster nearly all the time, up to 200k entries which is as far as I tested from the looks of it).

Ooh, nice. I can see that being useful for some of the time stuff I was trying to work out a while back (predicting time series).

Yeah I had a quick poke at this but there’s a bunch I don’t understand about the interface yet and kind of moved onto the next example.

I always prefer contextual examples rather than didactic ones, so I definitely understood after examples 2/3, though those both had the “press one, now press two, not press three, now press four” bouncing around thing which is harder to follow.

tremblap · September 25, 2023, 1:46pm

ok messages for Max and SC users:

Max: there is an interface change. predictSeries makes more sense as we do not predict from a point. I also corrected typos and stuff.

max-v2.zip (4.5 MB)

SC: there is now a version of FluidDataSeries, DTWClassifier, and DTWRegressor. The latter 2 do not yet have a .kr version but you can do some fun stuff with small FluidDataSeries - if anyone is on Windows or Linux and want to try, let me know.

sc-v2.zip (4.2 MB)

Feedback welcome.

p.a.

tremblap · September 25, 2023, 2:02pm

coming

arrived maybe @tedmoore @spluta and/or @mccrmck will be happy

rodrigo.constanzo · September 25, 2023, 7:45pm

A couple bits:

you should remove the zl change in example 2 for p playback. I kept trying to play the same segment again to compare only to realize it wasn’t working.
I feel like the segmentation here was different before? each segment now has a huge click at the end (the start of the next segment). I don’t remember that being the case

The rest looks good and works well on M2 mac/Max.

tremblap · September 25, 2023, 8:49pm

good point - zlchange was good in another patch

balintlaczko · October 10, 2023, 11:41am

Thanks for the amazing work, just tried them out, they look very sweet!

tremblap · October 10, 2023, 12:33pm

i think the nearest neighbour finding on time series is quite good (the drum example) and @lewardo has promised to volunteer an implementation of fast-dtw…

in the meantime there is a fun new thing coming too (the last of the summer) in the shape of LSTM - we’re troubleshooting him and me the last bits of interface - stay tuned!

tremblap · February 29, 2024, 3:26pm

ok people, I think i will start a new thread to merge this and this other one: FluidDataSeries....in SC

@mccrmck has found a bug in the printing, I fixed it.

more importantly, he found what I find 2 confusing interface decisions I want to propose fixes to:

getFrame should really be called getPoint to stick to DataSet - a frame is only a point in time so no need for a new concept… or do we? Frames are Points at a give Time?
the printing (and json dump) prepend a T to the frame number to stress that it is a Point in Time - but when we query, we query only with the numerical value of it. For instance, 2 and not T0002. maybe we should either
2.1 print without T
2.2 query with the trimmings

Any feedback welcome. I know very few of you have tried this but you might still have strong opinions

mccrmck · March 1, 2024, 1:48pm

My 2¢:

I think getFrame could work so long as the documentation makes the analogy to getPoint…they get passed different/a different amount of args; a distinction between the methods could be useful perhaps?
I like 2.1! The t was a good pedagogical hint the first time I .printed, but not really necessary otherwise

Thanks for the bug fix also!

tedmoore · March 1, 2024, 2:14pm

i agree

i think a ‘frame’ is different from a ‘point’ in that it has a context that a ‘point’ does not, so having a different term (and method) for each makes sense to me.

tremblap · March 15, 2024, 4:57pm

please see this thread now: