Fluid.buf2trainingdatasets (I know, terrible name)

balintlaczko · October 25, 2020, 6:45pm

Hey,
I made a bit more convenient abstraction for making the input/target training chunks for fluid.mlpregressor~ for timeseries forecasting. I am open to suggestions if anyone comes up with a better name. Should work, but scream if it doesn’t!

Best,
B
fluid.buf2trainingdatasets.zip (5.1 KB)

tedmoore · October 26, 2020, 6:06pm

Hi @balintlaczko,

I’ve been thinking about implementing some similar type handlers for SuperCollider. Could you write a sentence or two about what this does and maybe I can make a SC version. (thanks–your sentence will be much more efficient for me than me tracing a max patch).

Thanks,

T

tremblap · October 26, 2020, 6:25pm

@tedmoore I plan to test the other code too (in this thread) and I’m in SC mostly now so I should port it in the next few days… except if @balintlaczko gives you the sentence you need and you beat me to it!

balintlaczko · October 26, 2020, 8:50pm

Hey @tedmoore!

Great that you will do that! I just realized I need to make a slight change in the js code so that the last chunks are still sane, I’ll repost the corrected version in this thread soon!
The purpose of it would be that you have a multichannel buffer with some recorded data (for example a 3-channel buffer with xyz coordinates). Since mlp-s are originally not meant for time series prediction, we have to “hack” some temporal representation of the data. One of the simplest ways to do this is to chunk up the timeseries with the same window, offset by one step at the target dataset. The size of this window (which I call @chunksize in the Max abstraction, but I am open to other suggestions) will be the extent of our “fake” memory. So if your timeseries is “1234567”, and chunksize=3, we should have the following chunks:
| step | input | target |
| 0. | “123” | “234” |
| 1. | “234” | “345” |
| 2. | “345” | “456” |
| 3. | “456” | “567” |
[at the moment it actually goes on further with 0 values in the end, so I’ll have to fix that.]
I try to pseudo-codify it here:
0. Take the source buffer and chunksize, and 2 fluid.dataset~s, one will hold the input data, the other the target data.

Flatten the source buffer into an internal one with fluid.bufflatten~, let’s call this databuffer. The number of channels in the original source buffer can be stored as numfeatures. You will also need a counter for the labels, and an index to remember where we are in the databuffer (if you want to follow my implementation). You can also clear the fluid.dataset~ s, so that this whole op can be repeated without them complaining.
while index+(numfeatures*chunksize) < databuffer.length: (here I forgot the “+(numfeatures*chunksize)” in the original version)
2a. save databuffer[index:index+(numfeatures*chunksize)] as an entry with label as counter in a fluid.dataset~ for the inputs.
2b. save databuffer[index+numfeatures:index+numfeatures+(numfeatures*chunksize)]as an entry with label as counter in a fluid.dataset~ for the targets.
2c. counter+=1 and index+=numfeatures
After the while loop has finished, output a “done” event (or doneaction?).

EDIT: so after some testing I realized that the while loop condition should be while index+(numfeatures*chunksize) < databuffer.length:
EDIT 2: I should read my own code. (Made some other changes above.) Sorry for spamming.

balintlaczko · October 26, 2020, 10:30pm

Updated version with a didactic example in the help patch + without my silly debug patching.
fluid.buf2trainingdatasets.zip (6.2 KB)

tremblap · October 27, 2020, 9:13pm

This is great, thanks for sharing. It is one step further than what we discussed last week, as it is continuous instead of arbitrary segmentation like the LPT structure - a sort of multiplexed Markov chain, which reminds me of the hack the old Max people will remember using in [anal])

I’m curious if you tried a similar data structure on kdtree? with 5000points you are likely to have a lot to offer with 3 past for 1 future. you could even use knnregressor - it’ll definitely train faster

I’m not saying it’ll work but that is definitely a cool thing to try too I think. As soon as I’m done with the other 390429058 things on my plate I might try that.

Thanks for sharing!

balintlaczko · October 28, 2020, 4:10pm

Yes, the good old anal-prob! That example is still in the help patch (if that’s what you think about).

I haven’t tried it yet, but great idea! It might even have better accuracy with the multidimensional series. Will try it soon!

tremblap · October 28, 2020, 4:23pm

indeed!

our implementation has only one output, but with 3 past to the next future, that should do it with fun. you can interpolate 2 different ways as well or lock in the single nearest for a more ‘quantised states’ approach