Removing outliers while creating classifiers

Ok, looks like there’s details of the workflow here that aren’t clear to me, and that I’d missed that the motivation for using an MLP was with respect to the furthest sensor (although whether or not an MLP could deal with that depends on how the data is junk), rather than as a way of avoiding having to find a closed-form expression for mapping coordinates to arrival times and, possibly, dealing with extra nonlinearities in the physical system.

In my mind, the data flow was just [calculate relative onset times] → [mlp], but something more complex is happening? I’m not sure what the lag map is, for instance.

If the most distant sensor is problematic, then it’s probably worth stepping back from MLPs and looking at that: is it more noisy measurements (so, stochastic in character) and / or added nonlinear effects? Having some idea of what’s causing the dog-shittiness should help figure out what to do about it.

Meanwhile, with respect to the MLPs

Really 10 layers? I used a single layer of 10 neurons, i.e. [fluid.mlpregressor~ @hiddenlayers 10]. But it may be academic if we’ve established that we’re not trying even vaguely equivalent inputs / outputs. In any case, you usually want more observations than neurons (so 9 training points isn’t going to be that reliable), but if it converges significantly worse with more training data then that points to something either funky with the data, or a network topology that isn’t up to the job.

Standardizing will make the job of the optimizer easier, but shouldn’t necessarily make the difference between being able to converge or not.

Can you just send me your raw(est) training data and patches offline? I don’t think I’m going to get a grasp on exactly what’s going on otherwise.

1 Like

Ah gotcha, yeah should have been clearer.

The whole process is such (for the 4 sensor array, but similar for the narrower 3 sensor array):

  1. independent onset detection for each channel
  2. lockout/timing section to determine which onset arrives first and block errand/double hits
  3. cross correlating each adjacent pair of sensors
  4. send that to “the lag maps”*
  5. (find centroid of overlapping area)

The lap map approach is something @timlod tested and I then implemented in Max (with a bit of dataset transformation help from @jamesbradbury).

Basically each a lap map is pre-computed at on a 1mm grid as to what each lag should theoretically be for each sensor pair. In jitter it ends up looking like this:

Screenshot 2024-06-09 at 3.34.16 PM

On the left is the NW pair and on the right is the NE pair.

These were computed in a similar way to you’ve suggested as to what they should theoretically be based on drum size, speed of sound, tuning, etc… So these maps are drum/tuning/setup-specific.

Once the cross correlation values are computed, rather than send that to all four lag maps, it takes the index as to sensor picked up the onset first (step 2 above) with the thinking being that those TDoAs would be the most accurate/correct. That then does some binary jitter stuff to get the overlapping area:

Screenshot 2024-06-09 at 3.34.40 PM

Then some more jitter stuff to find the centroid of the little overlapping area.

///////////////////////////////////////////////////////////////////////////////////////////////

So the overall idea is to ignore the furthest reading as the cross correlation isn’t as accurate (nor is onset detection).

The MLPs role is to improve non-linearities in how this approach behaves close to the edge of the drum. This lag map approach is super accurate near the middle and does quite well for a long time, it’s only the furthest hits that get a bit more jumpy/erratic. I(/we) suspect there’s some physics stuff at play near the edge since the tension of the drum changes and energy bounces/behaves differently given the circular shape etc… So the physical model, lag maps, and probably the quadratic version, can’t really account for that. At least with the level of complexity that they are generally implemented at. Nearer the center of the drum it’s just an infinite plane of vibrating membrane, that seems to behave quite predictably.

There’s also the side perk that you also wouldn’t need an accurate physical model/lap map to work from as you could strike the drum at known locations and have the NN figure out the specifics.

Yeah I misspoke. A single layer with 10 neurons.

I’ll send you the data test patches offline (will tidy the patch up a bit first).

Actually this is correct. But getting relative onset times is the hard part here - if onsets detected across all channels aren’t aligned, nothing will work (they could even be consistently wrong - but consistency is the hard part.

Note also that lag maps really are only a thing because numerical optimization (solving multilateration equations) appears to be hard to achieve within Max - that’s why I came up with the approach to pre-compute all the possible lags within a given accuracy, and index into them to get the position of a hit.

So really I’m sure that the issue here is that the onset timings aren’t aligned properly - this means that different datapoints will contradict each other, making convergence hard/impossible. I’ll discuss this with Rodrigo in person soon, so we’ll know for sure!

The 3 vs. 4 sensor thing is more interesting from a technical standpoint here, as the 4-sensor data is easier to align (3 out of the 4 are closer to the actual hit, which yields data more amenable to align with cross-correlation).

I have to ponder this a bit more later, but I think zeroing out the last sensor might work in theory, at least if we don’t use a bias (my example convergence without using a bias). Then again, 0 holds meaning in this case (same distance to sound source)…
There might be something more elegant I haven’t thought of yet - at the worst we could train several networks, one for each configuration.
In PyTorch there’s also a prototype MaskedTensor implementation which might be just the thing for this, although I haven’t used it before.

1 Like

Just a little more context wrt. the nonlinearity we’re trying to solve/the motivation of using the NN - here are results of my corrected air mic data, once using a trilateration (based on calibrated microphone placements) and once based on training a NN on the same lags.
Here you can clearly see how next to the close microphone situated at the top/north end of the drum (the setup is 2 overheads and one Beta57A close) the physical model doesn’t detect hits accurately, whereas the NN solves this pretty much completely (one layer of 11 neurons):


(I had separated this into two pictures, but I’m only allowed to post one, hence this screenshot)

Note that in both cases the later hits going through the center are not real hits, but some theoretical lags I fed it to gauge interpolation performance here - the whole thing is based on just 40 hits (4 at each of the 10 drum lugs).

1 Like

I’ve still not had a chance to really burrow into this. Just to note that the shape in the right hand picture looks very suggestive of a sign-flip somewhere (which, I guess, is a built-in challenge with solving quadratics)

1 Like

Huh, thanks for the heads-up.
I know I had some issues with sign in the beginning of the project, but (at least thought that I) solved those then. However, this is already quite some time ago now, and I wouldn’t put it past myself to sneak some error back in :slight_smile:
I will double-check this just in case!

Edit: Just double-checked it. There’s no sign flip/the equations are fine.
Actually, in this specific case the calibrated mic placement is very close to there being two solutions along that particular path, with one being the correct one (at the top/towards the rim of the drum) and the other being at the bottom respectively. I think it’s a little bit of a pathological case - in principle, when fixing the z-axis, I thought that there should only be unique solutions, but I guess that with some measurement error I have come across a case where it’s easy for the optimizer to stop at and pick the wrong one (potentially because of the provided initial guess?).
In any case, thanks for pointing this out - I was lazy in not trying to understand exactly what the non-linearity here was (tbf this was the most striking case after fixing my data, leading up to this there was a bunch of noisier results that just looked like ‘hard to generally model close towards the close mic’ on this dataset).
It might help designing the calibration as well as the later optimization a bit better to prevent this from happening! That is if I don’t end up just using the NN for its ease of use.