Returning to this today and hit a bit of a snag as the regression I’m doing before this has no interest in converging on standardized/robustscaled data.
I know the loss amount doesn’t mean anything specific (is it in relation to network size or something concrete at least?*).
If I take my “real” 8d of perceptual descriptor (loudness/centroid/flatness/pitch) and normalize them, I get a loss of 0.086
with fluid.mlpregressor~
, which to me, seems “good”. If I apply robust scaling instead (and switch to @activation 3
) the best results I get with the same network settings were around 6.25
. I also tried with @activation 0
and that wasn’t any better. I guess having the outliers pushed to the edges past the standard deviations doesn’t make the regressor happy.
SO
I can run the regressor on normalized data, then take the normalized data and robust scale it to prep it for biasing, but that seems weird/wasteful. But would that be in line with what you’re suggesting?
*I got the best results so far with @hiddenlayers 6 4 2 4 6
with 8d of data on either side, which seems odd to me as that’s a huge network for such a low amount of dimensions. Is the loss value proportional to the amount of points (entries * dimensions) in the dataset, or the network, or a combination of both?