Ok, looks like there’s details of the workflow here that aren’t clear to me, and that I’d missed that the motivation for using an MLP was with respect to the furthest sensor (although whether or not an MLP could deal with that depends on how the data is junk), rather than as a way of avoiding having to find a closed-form expression for mapping coordinates to arrival times and, possibly, dealing with extra nonlinearities in the physical system.
In my mind, the data flow was just [calculate relative onset times] → [mlp], but something more complex is happening? I’m not sure what the lag map is, for instance.
If the most distant sensor is problematic, then it’s probably worth stepping back from MLPs and looking at that: is it more noisy measurements (so, stochastic in character) and / or added nonlinear effects? Having some idea of what’s causing the dog-shittiness should help figure out what to do about it.
Meanwhile, with respect to the MLPs
Really 10 layers? I used a single layer of 10 neurons, i.e. [fluid.mlpregressor~ @hiddenlayers 10]
. But it may be academic if we’ve established that we’re not trying even vaguely equivalent inputs / outputs. In any case, you usually want more observations than neurons (so 9 training points isn’t going to be that reliable), but if it converges significantly worse with more training data then that points to something either funky with the data, or a network topology that isn’t up to the job.
Standardizing will make the job of the optimizer easier, but shouldn’t necessarily make the difference between being able to converge or not.
Can you just send me your raw(est) training data and patches offline? I don’t think I’m going to get a grasp on exactly what’s going on otherwise.