Regression for time series/controller data?

rodrigo.constanzo · October 26, 2020, 1:35pm

Thanks!

It’s super fun to play with, and given the paradigm (sampling-based), it can end up sounding radically different.

Well, it ran for however long it needed to last night and I got an error of… 0. Which doesn’t seem right. I tried running a single point and after like 1min of churning, it returned an error of 0. So I suspect I fucked something else in the patch along the way.

With regards to the actual prediction part of the patch, cuz I’m not using the same bouncing ball, the logic in place there doesn’t seem to work right. Is the idea that I ask for a point, once it’s returned, I ask for another point, with a speedlim-esque thing capping that querying at 20ms a go? So ideally here I would ask for new points at whatever the data training rate was as it originally went in? (so if my data stream is a new point every 2ms, I would, ideally, ask for a new point every 2ms so the chunks and everything line up as they should)

Will close/reload everything and try again now.

rodrigo.constanzo · October 26, 2020, 1:43pm

Ok, I’m definitely fucking something along the way.

After the chunking step I get datasets with the following contents.

fluid.dataset~ ball.target contains:

rows: 5021 cols: 90
0 0.51603 0.47579 0.4732 … 0 0 0
1 0.51603 0.47579 0.4732 … 0 0 0
2 0.51603 0.47579 0.47472 … 0 0 0
…
5018 0.67166 1 0.26528 … 0 0 0
5019 0 0 0 … 0 0 0
5020 0 0 0 … 0 0 0

and fluid.dataset~ ball.input contains:

rows: 5021 cols: 90
0 0 0 0 … 0 0 0
1 0.51603 0.47579 0.4732 … 0 0 0
2 0.51603 0.47579 0.4732 … 0 0 0
…
5018 0.67166 1 0.26876 … 0 0 0
5019 0.67166 1 0.26528 … 0 0 0
5020 0 0 0 … 0 0 0

I’ve set the chunk size to 10, and the features to 9 (for my controller stream). I also set the initial recording to bl.gest.record~ ball_trajectory 9, so I think that should be correct. Don’t know why the datasets would be so empty.

The training happened really quickly now too, and gave me an error of -1. So I guess that’s lower than 33, but don’t think it’s supposed to go in that direction.

balintlaczko · October 26, 2020, 8:13pm

O-o, I haven’t had that, could that be some sort of bug?

So the prediction pipeline in the ball example is:

Choose a random entry from the training input dataset (actually it can be the target too, doesn’t matter, you can even try with some random data), put that in the query buffer, predictpoint that to the regressor and put the result in the “result” buffer.
When the regressor delivered the prediction we fetch the values as a list from the result buffer. Since our network i/o is like “[xy,xy,xy…xy]->[xy,xy…newxy]”, we have to chop the last two values and feed that to the ball (those are the actual prediction xy).
Then we can put this list (unchanged) into the query buffer and ask for a prediction again.
…and then we repeat 2-3.

The rate-limiting thing was just that we don’t get the values too fast for the animation. I was actually sloppy there, the “perfect” solution would have been to rate-limit stuff at 50ms (since my recording rate was 50ms) and interpolate between values (with 1 step delay), so that I get the same kind of result as the training data looked like.
But since the points 2 and 3 above create a feedback loop, you have to do a rate-limiting or gating of some sort.

What do you mean? The shapes look good to me, the only thing is that I haven’t actually cared about the remainders in the end of the chunking, which in case of this many features could pollute the dataset a bit, so I recommend you to chop off the last numfeatures (in this case 9) entries from both datasets. I will fix this in an update to the abstraction thingy. Also get rid of the first entries, since I can see the input starts with an empty slot(?).

They write somewhere in the docs or examples that the -1 means the training terminated with error.

I also think maybe there is an issue with presenting the button states like that, though in principle it should be OK. But just for a test, try a session without the button data, only the continuous stuff. And try just 3 memory steps, maybe there is an issue with the i/o layers being so big (90 nodes). [I can imagine that this “fake-timeseries” trick with the chunking will get less and less efficient the more features you have, since the corresponding numbers in the sequence are further and further apart - but maybe that’s only a problem for humans, dunno. Or it might be confusing to the network in this implementation where we cannot use arrays of arrays but instead have to flatten everything into a single array.]

Let me know how it goes!

rodrigo.constanzo · October 26, 2020, 11:04pm

At this point I’m thinking it’s more of a meat-space error as adapting stuff in a complex-ish patch like this can be tricky when I’m not entirely sure where all the plumbing goes.

Riiiiight. That makes loads more sense. I think the very first time I fed it data I got a whole long list of stuff out (that didn’t seem to change), and wasn’t sure what it was supposed to do. Or more importantly, how it was supposed to do it.

Right right. I hadn’t considered that those would be the “ends” of your chunks. I’ve just had issues in other places where something went wrong and I had loads of dimensions filled with zeros, which were indicative of something getting fucked earlier in the process.

I’ll try doing another version with just the continuous data, leaving out the binary stuff to see if that helps. Will report back my findings!

rodrigo.constanzo · October 26, 2020, 11:19pm

Not any better it seems.

My fluid.dataset~s now look like this:

rows: 5094 cols: 50
0 0 0 0 … 0.47472 0.44634 0
1 0 0 0 … 0.47472 0.44634 0
2 0 0 0 … 0.47472 0.44634 0
…
5091 0.67166 1 0.26528 … 0 0 0
5092 0 0 0 … 0 0 0
5093 0 0 0 … 0 0 0

and

rows: 5094 cols: 50
0 0 0 0 … 0.4732 0.44634 0
1 0 0 0 … 0.47472 0.44634 0
2 0 0 0 … 0.47472 0.44634 0
…
5091 0.67166 1 0.26876 … 0 0 0
5092 0.67166 1 0.26528 … 0 0 0
5093 0 0 0 … 0 0 0

So something kind of gappy happening somewhere.

My flattened buffer~ looks ok, but I’m guessing something went wrong in the chunking. These are the settings I used at that step:

Don’t know if this is related or not, but the training step still returns a -1. I’m using these settings:
fluid.mlpregressor~ @hidden 128 64 32 @activation 0 @outputactivation 0 @maxiter 10 @learnrate 0.1 @momentum 0.1 @batchsize 1 @validation 0

rodrigo.constanzo · October 26, 2020, 11:26pm

I think the weird fluid.dataset~ may be down to sampling rate. I just realized that I’ve been at 48k from Zoom pushing my computer around.

Ran the first half again and got a more reasonable looking dataset:

rows: 2538 cols: 50
0 0.51603 0.47579 0.4732 … 0.51501 0.40648 0
1 0.51603 0.47579 0.47472 … 0.53216 0.40648 0
2 0.51603 0.47579 0.47745 … 0.6349 0.39686 0
…
2535 0.67166 1 0.26876 … 0 0 0
2536 0 0 0 … 0 0 0
2537 0 0 0 … 0 0 0

Still can’t train on it though. (same -1 issue)

tremblap · October 27, 2020, 6:38pm

-1 means you don’t converge. So back to your learning rate, data norm, activations… same old same old…

rodrigo.constanzo · October 27, 2020, 6:47pm

Perhaps a more useful message (error message even) might helpful here. Since the “error” rate isn’t -1, as that’s not possible. Just using that information path to communicate error, process completion, and the fact that it’s not converging.

tremblap · October 27, 2020, 8:59pm

I think we will disagree on interface forever, but I can live with that. -1 is a good programatic way to allow you to do small number of iteration and check if you are going in the right direction…

tremblap · October 27, 2020, 9:03pm

ok I just got time to try it… and it is Max8 dependant! I’ll see if I get the courage to recode it for Max7… @balintlaczko let me know if that is hard to share what is happening in the two mc.gen~

rodrigo.constanzo · October 27, 2020, 10:35pm

“Failure”, I would think, would warrant a message or error of some type. I think that’s a solid convention across the rest of the objects.

rodrigo.constanzo · October 27, 2020, 10:53pm

I think the structure around the mc.-ing will be the harder bit to replicate, but here are the guts of those two patches (one mc.gen~ and one mc.gen):

Screenshot 2020-10-27 at 10.51.36 pm

Screenshot 2020-10-27 at 10.52.03 pm

weefuzzy · October 28, 2020, 12:58am

I’ll consider an error message (or warning, more appropriately) as well, but as @tremblap says, the -1 is helpful for responding programmatically, and the need to go through rounds of tuning and tweaking ANNs to find the sweet spot of rapid-yet-stable convergence is, lamentably, unavoidable.

For the problem at hand, I’d possibly start by (a) trying to train bit by bit (chunk by chunk), checking the result to see if it ever converges at all and (b) turning down the learning rate

tremblap · October 28, 2020, 9:34am

shouldn’t be too bad, both in poly~s

rodrigo.constanzo · October 28, 2020, 9:46am

I can see that being programmatically useful, and I guess that’s the nature of these kinds of processes. What happens in typically ML contexts, is the lack of convergence denoted via error amount?

Even though it says it in the help file, I only noticed it after going back to check to see where @balintlaczko saw it since it’s not something I’ve seen before from a Max object. (as opposed to the way -1 is used elsewhere to denote a generic or inherited state)

I guses a(n intentional) nan could be a thing too, as you can then use it programmatically but as a user you’d be more inclined to go ‘wtf’, and I guess it’s more mathematically correct (?), if it fails to converge.

balintlaczko · October 28, 2020, 2:53pm

Hey all, sorry for the little radio silence!

@tremblap: yes, sorry, the bl.gest.record~ abstraction uses mc. The mc.sig~ -> mc.gen~ listpoke is just my way to put the incoming list into a buffer. You can try a workaround with fluid.list2buf or making a poly~ with the sig~ -> gen~ – although in that case a simple poke~ will be enough.

The other mc.gen recordGesture basically pokes each sample in the buffer we just poked the incoming list into into different channels of the same index in the buffer we store the full recording at. (So it’s kind of the inverse of fluid.bufflatten~ and then storing that multichannel sample in the recording.)

But generally bl.gest.record~ is more intended to have a convenience device that just continuously listens to your input, and if there is any new data, it records that into a new slot in a polybuffer~. The waittime parameter defines after how much idle will it decide to finish (and crop) the current recording. So the idea is that you just keep on doing stuff, and it will record your “gestures” when you do something, and then you can have all your recordings at one place (the polybuffer~).

It is not overly tested though, I only needed it for one project, so I hope you won’t get bugs… (I already fixed some before I posted the example project, haha…)

Here is a screenshot of the help patch if it helps:

As for the training problems, @rodrigo.constanzo would you mind posting a file with your recorded data in some form? It is easier for me to see what the problem is if I try it myself.

A small user feedback from me: I think it would be cool to also be able to address activation functions with symbols, like @activation relu. It’s not a big deal, just thought it could be clearer/more comfortable to some.

And @rodrigo.constanzo, you might wanna try to swap the chunking part of the patch with this, which is more streamlined, + doesn’t produce unnecessary trailing-zero-entries in the datasets.

This whole conversation also made me think of implementing a similar, but LSTM pipeline in tensorflow.js, so it can be hosted in a node.script (and in an abstraction ultimately). I generally don’t trust Node inside Max, but maybe they have fixed the performance issues since I used it last time. Then we could cross-compare the timeseries predictions of fluid.mlpregressor~and a stack of LSTM nets. No promises about the when though, but hopefully soon!

weefuzzy · October 28, 2020, 4:48pm

Nothing monumental to add to the conversation here, save to welcome @balintlaczko to the group and say thanks for this energetic and interesting contribution!

Another interesting point of comparison for learning time series could be with the stuff Chris Kiefer did a few years back with Echo State Networks.

https://sro.sussex.ac.uk/id/eprint/51860/1/NIME2014-ESN.pdf

These are recurrent networks, like LSTMs, but don’t require backprop to train. There’s a PD object in Chris’ repo, but I did (at one point) have a working Max port that I’d also started to try and extent with online updates. Perhaps I should dust it off…

rodrigo.constanzo · October 28, 2020, 5:07pm

The patch I posted above has a coll with 5000 entries in it, along with a little visualizer thing for seeing what that data corresponds to. It is 9d though (5 continuous and 4 binary), so as per your suggestions my other testing sliced off the last 4. I just found it easier to test with this pre-recorded coll rather than grabbing the controller for each time I tested things out.

I second this as well. I think @jamesbradbury might have made some similar suggestions a while back. If you’re not super immersed in the lingo/jargon, seeing it as a numerical flag I assumed that it was referring to the @tapin @tapout functionality, where @activation 0 was listening to the first layer etc…

It gets funkier, but this is also the case for fluid.mds~ where there are loads of different @distancemetrics.

Back in the flow of work week/teaching, but this is on my radar to try testing. It’s just tricky to track where failures may be happening as I’m not entirely sure what the output would be at any given step.

balintlaczko · October 28, 2020, 5:40pm

Yes, I just realized that you already did, sorry. I will try to make something with it later tonight, see where it goes.

Thanks, feels great to be part of this!

Oh I would really love a windows build (or the source) for that, if you don’t mind! (<3)

balintlaczko · October 28, 2020, 11:34pm

Hey all. So Here is my test with the xbox data:
xbox_test_mlp.zip (1.4 MB)
Outline:

In p model there is my “usual” tf/keras setup implemented in Max, it has an early stopping callback, it optionally saves the new best weights to json, and it has a stop threshold (error=0.01). My best weights turned out to be around 0.0998, not so good, but not terrible.

The prediction always “freezes” to a single point after a while, and normally this is a sign of not sufficient training (it trained for a few hours on a single core). But here I also think that it is an issue of how we frame the problem. So we have a 10 step memory. From this perspective the input data is almost “only” idle, with a few ramp ups/downs every now and then. I trained with the button data included (even though the 9th dim never had a single “on” state - [right?]), but as I expected, the network didn’t pick too much up from it.

Here is a little example of the inference:

The buttons never fire at all. I included the weights json in the zip, so you can try without training.

I think the data needs to be represented in a better way, filtering out the idle stuff (and possibly dropping the button data or representing it in some other way). Everything has to be represented so that this simple mlp - having a “memory” of N steps - can have an overview of the whole dataset somehow. Maybe it would be worth to try training this with the idle parts filtered out, and then training another network on the same data, that should learn when to be active and when to idle.

Also, I think, since our i/o vector size is 90 (considering 9 dimensions and a 10 step chunk) I don’t think any of the hidden layers should be smaller than that. I might be completely wrong, but I just can’t imagine what would happen to 90 numbers squished through 32 nodes and then 90 more in the output layer.

Also, I have a feeling that the @activation attribute for fluid.mlpregressor~ is 1-based, could that be?
EDIT: actually now I looked at it again, and the 0th slot is called “Identity” - anyone knows what that means? Anyway, I was using @activation 1 which I thought was relu, but now I understand that it was sigmoid. Maybe the reference could be clearer about this (it does not mention “identity” + lists them in the wrong order – assuming that some user like me implies the order from the description).

Next, I will try the knn approach suggested by @tremblap, will see how it goes.