Removing outliers while creating classifiers

Ok, looks like there’s details of the workflow here that aren’t clear to me, and that I’d missed that the motivation for using an MLP was with respect to the furthest sensor (although whether or not an MLP could deal with that depends on how the data is junk), rather than as a way of avoiding having to find a closed-form expression for mapping coordinates to arrival times and, possibly, dealing with extra nonlinearities in the physical system.

In my mind, the data flow was just [calculate relative onset times] → [mlp], but something more complex is happening? I’m not sure what the lag map is, for instance.

If the most distant sensor is problematic, then it’s probably worth stepping back from MLPs and looking at that: is it more noisy measurements (so, stochastic in character) and / or added nonlinear effects? Having some idea of what’s causing the dog-shittiness should help figure out what to do about it.

Meanwhile, with respect to the MLPs

Really 10 layers? I used a single layer of 10 neurons, i.e. [fluid.mlpregressor~ @hiddenlayers 10]. But it may be academic if we’ve established that we’re not trying even vaguely equivalent inputs / outputs. In any case, you usually want more observations than neurons (so 9 training points isn’t going to be that reliable), but if it converges significantly worse with more training data then that points to something either funky with the data, or a network topology that isn’t up to the job.

Standardizing will make the job of the optimizer easier, but shouldn’t necessarily make the difference between being able to converge or not.

Can you just send me your raw(est) training data and patches offline? I don’t think I’m going to get a grasp on exactly what’s going on otherwise.

1 Like

Ah gotcha, yeah should have been clearer.

The whole process is such (for the 4 sensor array, but similar for the narrower 3 sensor array):

  1. independent onset detection for each channel
  2. lockout/timing section to determine which onset arrives first and block errand/double hits
  3. cross correlating each adjacent pair of sensors
  4. send that to “the lag maps”*
  5. (find centroid of overlapping area)

The lap map approach is something @timlod tested and I then implemented in Max (with a bit of dataset transformation help from @jamesbradbury).

Basically each a lap map is pre-computed at on a 1mm grid as to what each lag should theoretically be for each sensor pair. In jitter it ends up looking like this:

Screenshot 2024-06-09 at 3.34.16 PM

On the left is the NW pair and on the right is the NE pair.

These were computed in a similar way to you’ve suggested as to what they should theoretically be based on drum size, speed of sound, tuning, etc… So these maps are drum/tuning/setup-specific.

Once the cross correlation values are computed, rather than send that to all four lag maps, it takes the index as to sensor picked up the onset first (step 2 above) with the thinking being that those TDoAs would be the most accurate/correct. That then does some binary jitter stuff to get the overlapping area:

Screenshot 2024-06-09 at 3.34.40 PM

Then some more jitter stuff to find the centroid of the little overlapping area.


So the overall idea is to ignore the furthest reading as the cross correlation isn’t as accurate (nor is onset detection).

The MLPs role is to improve non-linearities in how this approach behaves close to the edge of the drum. This lag map approach is super accurate near the middle and does quite well for a long time, it’s only the furthest hits that get a bit more jumpy/erratic. I(/we) suspect there’s some physics stuff at play near the edge since the tension of the drum changes and energy bounces/behaves differently given the circular shape etc… So the physical model, lag maps, and probably the quadratic version, can’t really account for that. At least with the level of complexity that they are generally implemented at. Nearer the center of the drum it’s just an infinite plane of vibrating membrane, that seems to behave quite predictably.

There’s also the side perk that you also wouldn’t need an accurate physical model/lap map to work from as you could strike the drum at known locations and have the NN figure out the specifics.

Yeah I misspoke. A single layer with 10 neurons.

I’ll send you the data test patches offline (will tidy the patch up a bit first).

Actually this is correct. But getting relative onset times is the hard part here - if onsets detected across all channels aren’t aligned, nothing will work (they could even be consistently wrong - but consistency is the hard part.

Note also that lag maps really are only a thing because numerical optimization (solving multilateration equations) appears to be hard to achieve within Max - that’s why I came up with the approach to pre-compute all the possible lags within a given accuracy, and index into them to get the position of a hit.

So really I’m sure that the issue here is that the onset timings aren’t aligned properly - this means that different datapoints will contradict each other, making convergence hard/impossible. I’ll discuss this with Rodrigo in person soon, so we’ll know for sure!

The 3 vs. 4 sensor thing is more interesting from a technical standpoint here, as the 4-sensor data is easier to align (3 out of the 4 are closer to the actual hit, which yields data more amenable to align with cross-correlation).

I have to ponder this a bit more later, but I think zeroing out the last sensor might work in theory, at least if we don’t use a bias (my example convergence without using a bias). Then again, 0 holds meaning in this case (same distance to sound source)…
There might be something more elegant I haven’t thought of yet - at the worst we could train several networks, one for each configuration.
In PyTorch there’s also a prototype MaskedTensor implementation which might be just the thing for this, although I haven’t used it before.

1 Like