FR: adding a 'fit stop' message to regressors

I know this is a bit late in the game for an FR, but this wouldn’t really break things and would make it easier to run big trainings on regressors without needing to pinwheel.

So the main issue is that I, like I imagine many people, do my regressor training in a defer’d loop like this:


This works well in that you can monitor the progress with smaller @maxiter sizes, then manually stop the training when you think you’ve gone far enough.

The issue here is that it’s quite easy to overfit the data as there the “early stopping” criteria is ignored by this approach since even if the amount of epochs don’t improve after a bit and the actual fit-ing bails, the new fit message keeps the process going either way.

The version of the patch on the right would work better in that you just run a single huge @maxiter and call it a day. Or manually run it a few times if need be, but the problem with this is that it’s hard to tell if you’ve actually fit things well, especially since the loss range is arbitrary(ish), so you get a single number back, with no context as to whether that’s good or not.

So what I’m proposing here is to add a message output to the regressors where they output a stop message, or maybe fit stop (the former would likely be better as it wouldn’t break any existing patches) when the “early stopping” criteria is met. That way in a loop like above, I can use that to stop the overall training and avoid overfitting.

It would also mean that I could avoid needing a toggle for the loop at all, where it can just train until it’s happy, then bail and let you know how it went.


A more fancy version of the FR would be to have any fit message report the loss as it’s being computed, rather than a single time at the end. Or perhaps it can always spit out x amount of loss values per @maxiter such that if you set @maxiter 100000, it will spit out 10 loss values as it is being computed, so you can see if things are improving over that time.


Lastly, if there’s a way to do this currently that I’m missing, do let me know.

So: the early stopping criterion is against the error on held out validation data rather than the training data (if @validation > 0), so there isn’t actually a simple way to implement early stopping outside the object: one would have to manually make and test against a validation set.

That’s not ideal, we should consider adding an option to report the validation loss as well, which is lowish hanging fruit. More complicated would be to have something fit-like that actually persists the whole relevant state between calls – tracking the validation loss, but also using the same partitions of the data.

The fancy version would be very hard to do as it stands: the fitting loop would need to be fired off into a different thread in order to report the loss(es) asynchronously.

In this case I’m using the default @validation 0.2, so there’s some in place, it’s just being banged over and over externally obviously.

I guess a validation output message would solve the use case here (I think?), where I can just check for that and call it a day once that stops improving (or whatever).

I guess in general the UX is tricky to navigate (as a non-data person) since ideally this would either be a single message thing “make it good”, with the stuff like @maxiter being more like the @zlmaxsize or @maxfftsize as an upward limit of what you may ever expect to do, but not what you intend to do or having a clearer iterative process where you can monitor the progress and decide when it’s good enough (if you know what you’re looking for).

Don’t know if there’s any overlap with the partition stuff you were talking about in this thread, but I’m a big fan of overhead being added in exchange for ease of use/better results.

‘partition’ here is how the data gets split into training and validation sets (or, more generally, into training and test sets). As it stands, every time fit gets called a new partition is made, selecting a random portion for the training data and validation set. So, in that sense calling fit iteratively isn’t equivalent to calling it once with a big maxIter. How much that matters, I’m not sure tbh – clearly it works well enough for getting the training to converge, but I wonder if it would make comparing validation losses more noisy.

(The answer is suck it and see, ofc. I’ll put an issue up to prod at this when I can)

1 Like

I like this.


I know I keep doing this thing where I read a conversation between you two and just drop a patch here and peace out… maybe it is useful, maybe it is annoying?

Here’s a patch doing classification that creates a training/testing split (randomly about 80/20 respectively) then then validates on the testing data. This way we get the training loss, the validation loss, and the testing set persists across fittings. However scrambling the validation set (as happens when using validation > 0 and repeatedly calling fit) is in itself a kind of cross validation that is helpful in preventing overfitting).

This patch could certainly be cleaned up maybe formed into a nice abstraction?


Most emphatically useful :heart:

Love it :pray: I know I’ve said it before, but a compiled partitioning object feels like an increasingly serious lack. Doing the partition at the point of dataset creation as you do here makes it less faffy than trying to partition extant data / label sets in extant code (which is variably dreadful in different environments: not even sure there would be a way in PD at the moment), but seems like it should be a one-click job in a better world.


Riiiight. Ok, that wasn’t clear to me as I just assumed that some of that would persist over.

So the intended UX here is to just use a big chonky maxiter once and call it a day? (unless one is purposefully trying to cross-validate or do something special).

Yes please!

For stuff in those other threads about computing class means, or removing outliers, etc… it’s an absolute nightmare to code the manually separating/dumping/organizing of the sub-datasets, much more so than the actual computation being done (means/outliers).


I’ve done this kind of stuff manually, which is a big pain in the ass. I do wonder how generalizable this would be using an arbitrary amount of classes etc… but being able to get a % correct out of the validation data would be super handy.

(I do wish the reported error was easier to tie to some kind of normalized range so it intrinsically communicated more)