MLPRegressor training versus validation loss

hi all,
congrats to this great project!

i have some questions concerning MLPRegressor, and training,
following the tutorials and using the proposed function
(found here MLPReg doc: Making Predicitons on the Server (Predicting Frequency Modulation parameters from MFCC analyses))

it outputs the “error” for each fit (training loss), all fine! looks like it should i think.
you see it was training many hours (and not stopping; this makes no sense but i tried to get it to stop), so it has (very likelyy) overfitted,

[1] my q now is:
is there a way to also get the validation loss? to find the point to stop?
(like eg in Tensorboard)

[2] the docs say: validation_ “If it is found to be no longer improving, training will stop, even if a fit has not reached its `maxIter number of epochs.”

does “no longer improving” refer to the training loss? (it smoothes out and stops cause of no improv.) or does it refer to the validation loss? (when val loss goes up, overfitting starts… then it stops), so “early stopping” is built in and i dont need to plot validation loss…)

thank you very much!

best marco

Hi and welcome @doett

It refers to the validation loss so – in principle – you shouldn’t need to track that, and the object should stop early at a useful point.

I would like to develop some better testing and validation tools so that models can be more rigorously compared and evaluated etc. but haven’t yet really figured out what would be a musician-friendly interface (see Removing outliers while creating classifiers - #11 by weefuzzy for the start of a discussion about that)

3 Likes

thank you very much @weefuzzy for the fast answer and the topic link,
this makes sense!

i think it would be helpful to have the possibility to also get the validation loss value for each batch (and plot it) to have an impression how things go…
if it is not stopping (like mine now) there is as far as i can see no way to (“meaningful”) stop it but by instinct (or ear while using the model, well, other problems start here),
plotting the val-loss would give a small bit more of information to find a good fit (or further tune hyperparams by hand… and compare plots)

thanks

Hi @doett,

I see you’re in SuperCollider and doing regression, so sorry this patch is Max and doing classification…, but you might check it out. It does a manual training / testing split and some validation. Maybe it’s an example of using the existing FluCoMa tools to build what you’re looking for.

If you do put something like this together in SuperCollider, I would really love to see it. It would be super useful for the community!

2 Likes

thanks @tedmoore
i try something and come back later!

1 Like

hi so this is what i have came up with until now,

i split my datasets (in and out) to train and test splits (in a %ratio),
then i calculate the MSE from predictions of the model and ground-truths from the out-test-set and plot it.

using .predictPoint and .getPoint in a looped routine was my first guess but its a bit messy and not very solid…
in the file attached some problems of this approach are tried to bring to your attention -
i hope there is a more reliable way, for now this works for me somehow,
the model will perform next week at a concert, so - well- it is what it is at the moment…

the output shows training loss / training loss (exp moving avg) / test loss / test loss (moving avg)

the model trains ok i think but cannot really deal with the dataset
(which is part of the idea somehow…) and the test loss is very noisy
thats the reason why it never stops training i guess.

i am very grateful for any comments (on clumsy things and errors…)
and maybe other directions of implementing it in flucoma world!
thank you

MLPReg.zip (1.2 MB)
file and dSets

@doett,

I like the way you’ve put this together. I’ve only poked at the code a bit. I will more this week.

Yes, it looks like your training loss keeps going down and your testing loss keeps going up! Of course this is classic overfitting. With these small datasets and our musically-oriented goals sometimes this isn’t really a problem, it all matters how it sounds!

Thanks for sharing this and for putting together a nice testing validation loop!

T

1 Like

thanks ted,
it somehow does what i expect it to do for this one…

but digging a bit deeper and trying other things i just read that shuffling the test-dataset for each batch (what i did) is not necessary (or a mistake even… is this so? i was quite certain that it should be shuffled, cant find the read now fast…)

it only introduces unnecessary randomness in the test-loss (like my last pics) - machine learning - why would you shuffle the test dataset? - Stack Overflow

if i dont shuffle it (see below) it looks fine (and should at a certain p increase due to overfitting - i let it run and see later…

(in the ~validation function, changing one line:
var batchIds = ~testIds[…batchsize-1]; //same each batch (rand only once)
//var batchIds = ~testIds.scramble[…batchsize-1]; //every test-batch rand data pts

thank you
best marco

seems to work - and looks familiar
( but internally it is not stopping, why ? )


thank you!

Hello @doett and thanks for this fantastic patch! I am doing a few tests now while I reply…

First a few optimisation ideas: you can run predict instead of predictPoint to run your tests in one go (although the randomisation might become tedious) then tobuffer to export… also, to get rid of your sync problems, you can use a Condition to wait instead of timers in your Routine.

I didn’t change those, I was too eager to go to the results (or lack of stopping)… I only replaced the mlp validation size to the same percentage (~split) to make sure we compare the same thing. Or maybe I don’t understand what you are trying to do…

I’ll let your patch run for one hour since at the moment I haven’t got anything else than noise in my validation - I’m reading the implementation code too to check there too.

more soon, I hope

so, the algo is waiting for 10 batches in a row of the validation error to be going up.

I think we would need to do your tests with an numIter=1 so that way we can see the internal value then. I might compile with a printout just to follow if it actually stops, which I’m 99.99999 it does… but hey, life is full of surprises :slight_smile:

(if you are interested, the code to read is relatively English (as it uses a lot of Eigen and good variable name discipline, thanks to @groma and @weefuzzy)

1 Like

hi @tremblap, thank you for the optimisation ideas, i will try this! makes sense…

to avoid the noise in the test loss graph you could set
var batchIds = ~testIds[…batchsize-1]; //same each batch (rand only once)
in the ~validation func (the noise comes from giving each batch new test data points…)

as far as i understand this makes no difference (only has impact on training)
before the mlp sees any train data, the test data is split off and it never sees it,
same for the validation data (it never sees it to train) but it can have a different % cause this split is taken from training set (not test set)
does this make sense ?

i am curious if it stops… ; )

ok i have a nice printout version (dirty hack) which lets me know the patience and validation error so I can see in a few hours if it ever gets to 10 :slight_smile:

it does make a difference as you are changing the ratio of what you test vs what you train on, no?

in the code I pointed at we have (simplified)

  1. for each iteration:
    _a. permute the indices for the training set
    _b. for each batch
    __i. forward process on all of the batch
    __ii. compute the error
    __iii. backpropagate the inverse
    _c. if there is a validation
    __i. compute the loss of the whole test batch
    __ii. if larger than previous iteration, count, if not, reset count
  2. once all iterations are finish
    _a.run the whole input (not just the training set) and spit out the error

once the ‘larger than previous iteration’ count is at 10 you stop early (it means it hasn’t improved for 10 iterations)

so at the moment I never get more than 3 iterations in a row going up (since I started running it 15 minutes ago) so I’ll let you know if I ever reach it. I’m also plotting so we’ll see if I get what you have.

more soon (or late it seems it takes time :slight_smile:

1 Like

yes absolutely! but the ratio of the validation (0.1 eg in mlp params) and the ratio for train-, test split i built and used (0.05) is not connected (so does not need to match) - cause these points do never meet (are used to train), that’s what i meant…

ja, i see - the code is very clear! great work
and thank you very much for running it !! waiting here…

exciting, I now get at times 4 iterations in a row that are going up… another hours I reckon… or maybe more

1 Like

I’ve edited since I wrote the exact opposite. It is 10 in a row not improving, so going up… :man_facepalming:

anyway still the same rivetting excitement :slight_smile:

1 Like

yey - fantastic!!
so its breaking out now? or not…
anyway on my end its great to know that it would like to early stop,
i can build this also in the train loop to exit in sclang
merci!!

in all cases, it is still too jumpy to reach 10 non-improvements in a row. I’ll let it run and see how it fares.

i see, thanks a lot!

ok I got it to stop (although it doesn’t stop your training loop…)

so it works - but how to use it in the language is interesting.

for instance, set the momentum to 0.99 and the learnRate to 0.01, a super high number of iterations, large batch size. it’ll stop when it doesn’t improve for 10 in a row. that is the use case - then you can check if the overall error is good enough. if not, reseed (or not) and restart? It is sort of the opposite of what you do (trust is stops early and assess the fitness)

~mlp.learnRate_(0.001).momentum_(0.999).batchSize_(100).maxIter_(10000);
~mlp.fit(~inTrain,~outTrain,(_.postln))

(with my special troubleshooting I can see it actually stops early when it deserves to be)

@weefuzzy one question: we publish the error of the full set (not just training nor just validation) even when we use validation. I reckon this is right.

interface-wise, I wonder what is the use case for each…