On feature scaling/data sparseness (in a specific spectral context)

If you get wild results then lowering it should help to smooth out the change between epochs and to make sure you don’t miss local minima. I find that when the learning rate is too high (for processes that have a learning rate) it is akin to loading a spring with too much force (it may fly off in any direction).

Now I’m sure @weefuzzy will have wiser approaches… My trick, which is dirty, is to set it at 0.1 and see after only 1000 iterations. if it bounces aka if the returned error is going up and down, I divide by 10 and continue until in bounces again, and divide by 10 again. that way I progressively reach some point where it stops.

@groma told me that was crazy and restless and he uses smaller values and more iterations

Finding a decent learning rate can be frustrating. Values as small as 1e-5 or 1e-6 aren’t uncommon; perhaps even smaller with more complex tasks. I might typically start by finding the order of magnitude that seems to converge, and then tweak a bit from there.

Thanks all - after no success with learning rates, I had some more investigation, I think what was happening here was probably that some of my data was getting totally zeroed - but still passing the energy test that happens prior to the more detailed spectral analysis. Thus I likely had cases linked to all labels in which the data was all zeros (alongside actual data).

It turns out that this is not helpful.

The good news is that if I do my energy test after the thinning of the data I get better accuracy (within my moderately shoddily defined testing procedures) - so it seems like I might be on a reasonable track towards a good level of accuracy. I will continue to sanitise and at some point I’ll need to figure out how it will fare with much bigger training sets (right now I have 6 classes and about 3000 training examples).

The learning rate ideas may also be a route to getting better results. At the moment most of the parameters to the MLP feel pretty mysterious so I will make it a point to investigate these more seriously in order to further massage the numbers in the direction of what is in my case “good”

1 Like

So - in relation to learning the parameters of this object - where should I look? The help file and reference in max don’t explain the terms - just name them - obviously I can google, but is there an advised source of info on this? The forum doesn’t appear to have a comprehensive set of answers (if I search momentum I just get results from code, for instance).

[Yes, the help files are behind the curve a bit. Bear with us, etc.]
Presumably something going into more detail than the rambling video I did in the summer? With the obvious disclaimer that @groma is the geezer who really knows this stuff, some more in depth pointers:

This paper by Yoshua Bengio, Practical recommendations for gradient-based training of deep architectures, is a chapter from the eye-wateringly dear book Neural Networks: Tricks of the Trade. Besides other things, it describes most (all?) of the adjustable knobs you’ll find on our mlp objects, and some indication of how to approach them.

This paper by Leslie Smith, Cyclical Learning Rates for Training Neural Networks, whilst actually about a scheme for programatically optimising learning rates during training, generalises to some pragmatic advice, I think (which boils down to it being essential to establish a workable range, whether or not you’re using an automated schedule or not).

The momentum parameter is also quite important in squeezing training performance out of a network. This article, Why Momentum Really Works by Gabriel Goh, dives into that, and because it’s on distil.pub, there are nice widgets to play with.

This is another chapter from the Tricks of the Trade book, Stochastic Gradient Descent Tricks by Léon Bottou, which intersperses some technical description with pretty clear advice in bold print in boxes (so, good for the skim reader in your life…)

There are also examples in the example folder which should help intuit some of them… and there are also @weefuzzy and @groma curated website in the other threads on MLP, from towardsdatascience. Their explanation is quite clear enough to get one going…

This is really fun to play with, and so clear. I presume @groma was right once again and I’ll have to be more patient, lower my learning rate and raise my momentum :smiley:

OK - the main improvements I seem able to make are in terms of the input data - I am now up near 90% accuracy even for stuff that is probably not perfectly represented in my training sets, so this is all quite promising.

In fact my learning rate went higher to do this - I can’t seem to get much effect out of momentum in my particular case.

I still don’t understand what changes when I set validation to something other than zero, however… I get the concept fairly generally, but I don’t understand what will change in the output of this object, or how to make use of this.

@groma will know for sure, but validation would be the ratio of the dataset you put in for training that is kept to check your training (instead of checking on the training data). The google playground (https://playground.tensorflow.org/) has an equivalent IIUC, which is the ratio (first argument)

The idea is that validation will enable early stopping if the performance of the network against some reserved test data doesn’t improve over a number of iterations. So it can help stop overfitting.

1 Like

Thanks @weefuzzy - as a concept that’s great, but I guess what I’m not understanding is how that works in practice. I’ve seen some patches that have a feedback mechanism for epochs on the forum (is that what I’m supposed to do here?), but from your answer I don’t know:

  • if the early stopping is my responsibility or that of the object
  • when the validation data is used and what for
  • how I access any values related to validation and use them

these are less conceptual questions than object interface questions and I don’t know how I can learn the answers (they aren’t in the help or reference, and I don’t see an obvious example in the examples, but I might be skimming them too fast). Likewise the object has three outlets and I don’t know what they do or how to find out… If I’ve missed something apologies - happy to be pointed to whatever is helpful.

The documentation isn’t anywhere near complete yet, as you’re discovering. FWIW, I don’t think this feature is working as designed at the moment (I have passed this on as a result of you raising), so if I were you I wouldn’t invest more time trying to understand what it’s doing on this release.

However, in lieu of finished documentation, here are the answers for when this feature is working:

if the early stopping is my responsibility of that of the object

The object

when the validation data is used and what for

During each iteration of training, to see if the updated weights also show improved loss against unseen data

how I access any values related to validation and use them

They are some randomly selected fraction (which you specify) of the dataset you passed to fit

Fortunately it’s the same pattern for all the non-audio objects currently. The rightmost is where the action is for all TB2 objects, and will emit the selector name for whatever message just finished. The middle is the progress outlet (not applicable here). The left is a bang out (not applicable here)

Soon the redundant outlets will be magiced away.

Thanks - that’s all useful - I’m likely validating by hand for now, so I will set it to zero.

If you can be sure to be clear on when outputs disappear that will be helpful as obviously patches will break, although it should generate a max error, so that is always easier to trace.

Outlets won’t disappear without warning, and certainly not in the immediate horizon.

Hi yes the validation is broken, will be fixed soon.

Thanks - what is the criterion (or criteria) for early stopping, and will it be controllable?

I believe the criterion is that the training loop will bail if the model loss w/r/t the reserved validation data fails to improve for five consecutive iterations. The number of iterations isn’t exposed as a parameter.

Thanks - I guess I was wondering if there was a threshold on “fails to improve”, rather than iterations, but perhaps this is simply a true or false scenario in terms of the exact number of correct classifications.