So pretty much the entire time I’ve used fluid.mlpregressor~
I’ve always tested/varied the @activation
based on what I’m sending in (typically tanh for -1,1 and relu for 0, 1), but I’ve always left @outputactivation
at the default linear output (honestly didn’t even know there was a second activation output until a buddy mentioned it the other day.
The reference isn’t especially useful in clarifying things as this is the text for @activation
:
An integer indicating which activation function each neuron in the hidden layer(s) will use.
And the @outputactivation
is:
An integer indicating which activation function each neuron in the output layer will use. Options are the same as activation.
So is this a matter of just selecting something that will mesh well with whatever kind of inverse transform (or whatever) is on the output of the regressor? (e.g. using tanh on the output if you used standardize/robustscale or relu if you used normalize) or is there something more at play here?
I guess with the actual @activation
this is tied into the gradient descent and will have an impact on how the network builds, functions, and has its loss computed, but the @outputactivation
is just a single layer. Is it also included in the loss computation and/or gradient descent etc…?
In short:
How should one use @outputactivation
vs @activation
?