Well, bearing in mind one of your other current threads, a bit of both: if you can pair down the input dimensions to get rid of redundancy through, e.g. correlation or low variance to start with and then find a minimal chain of processes that gets you somewhere useful, that’s the dream.
The loss for the MLP doesn’t depend on the network architecture, but it does depend on the range of your target values (i.e whatever it is you’re fitting to). The loss value is the sum of the squared differences between each prediction and the training point, divided by the number of samples in the batch. So, the range of the loss will scale with the dimensionality of the output vector, but is also nonlinear: i.e. it will look much scarier for bigger errors.
So, sure, you can totally work out some ballpark ideas of what’s reasonable: if you have 76 dimensions in the range 0-1, then clearly 76 would be a bad, bad number. If they’re all in the range 0-2, then the maximum loss becomes 22 * 76 (304). If the ranges are heterogeneous, you can still use fluid.normalize either to normalize the data, or just to peak at the ranges in the training set (but then it’s harder to reason about the overall wrongness)
Playing around with your MFCCs earlier whilst tinkering with the patch above, I found I was getting more rapid convergence if I normalised, so I know that the outputs were in 0-1. In the screenshot it’s reporting a loss of 1.7-odd, which would average out at an error of ~0.15 (√(1.7/76)) in each dimension, which isn’t (yet) wonderful but not totally horrible either. To get that down to a 10% average error would be a loss of 0.76 (in this case).