Speed comparison with dimensionality reduction and fluid.kdtree~

rodrigo.constanzo · July 25, 2020, 10:58am

Sadly we didn’t end up doing a break out session, but I wanted transfer a bit of the discussion over from Slack where @a.harker was getting some slow querying out of fluid.mlpregressor~ when being used as an auto-encoder.

I did some testing myself and found that:

I get around 28ms per predictpoint query. This is using the same settings that are in 8c, trained on a 19d → 2d space with just 10 entries. I got the error rate down to 0.02 before testing this.

So this is a bit of a broader question in terms of where an auto-encoder falls in the overall scheme of things with regards to speed.

Based on some of the discussions in the thread about transformpoint and fluid.mds~, where it became clear that that’s not possible given the algorithm, with @weefuzzy then mentioning that:

I’m wondering about doing something like PCA->NN to get something from a ~200d space down to 2d, by taking a huge chunk down with PCA and then getting the last bits with fluid.mlpregressor~.

At the moment even going from 19d → 2d with fluid.mlpregressor~ is “really slow” (~28ms per predictpoint as mentioned above, with a tiny data set).

When @tremblap first presented fluid.mlpregressor~ a few weeks ago during the Friday geek out sessions, he said that mlp stuff takes a long to compute, but once it’s computed, running data through it is really fast since it’s just simple multiplications.

I don’t know if that holds true for predictpoint as well, and there’s either a bug and/or it’s not optimized at all (particularly since it’s fairly fresh research-ish code) or if predictpoint is an altogether more complex operation that will never be as fast as simply running values through the NN once computed.