When to stop a training when not using validation data

Hi! When we train a model we split our data in train and val, to evaluate when the model overfits and when underfits so that we can save the model at the right time.

My question is, when we know a model architecture is good we should train with all the data, train+val, so that our model learns more about the data, but if we remove the val data, how do we know when to stop the training? Should we just use the same number of epochs and the same learning rates as we did before and trust it works in the same way?

Thanks a lot !

You should also have a test set. Once you are finished refining your model and feel you have squeezed out all you can, then use the test set to confirm that you are generalizing for all situations and not just the scenarios in your train and validation set.

You could combine your train and val set and use the same parameters as your final model and then test it.

It happens all the time on kaggle where the model looks great against your validation and then when you submit it for testing it comes out terrible.

Does that make sense?

More or less. Let’s say you have your data divided in 3:

  • 60% train
  • 20% val
  • 20% test

You train your model, tune your hyper parameters, you put 300 epochs and save only the best model, deciding which model work better using the val data and stopping the training when no further improvement is detected (also using val data).

Now you want to replicate that last training (lets say it was 50 epochs lr=0.01 and then 50 epochs lr=0.001, and best model occured in epoch 67, then it started overfitting) You are saying that now we should use the test set as we used the validation earlier? For me it does not really make sense because you would like to also use that test set to train the model so that you get the best possible model.
So my only guess would be try to replicate the best training phase (50 epochs lr=0.01 and then 17 epochs lr=0.001) in the whole labeled dataset, maybe adding some more epochs due to the new data added to the model.
I am not sure if I made my point clear or not.

Thanks a lot!!

1 Like

Yeah its complicated aye and probably dependent on each data set.
In the kaggle comps you get the training data which you are free to split into any train and validation set. Once you are confident in your model you then process the test data set that they provide. I think you are right that in a lot of cases you could incorporate the validation set into your final model.

The main reason in my mind for the test set is to ensure that the validation set that you used is actually representative of a bigger picture and that your AI makes good predictions outside of its training wheels. We dont want to find that out in production, we want a final test on data points that the program has never seen before.

All in all I think you are right - if you find good modelling at 67 epochs then cut it there, add your validation and test it against the test, if that predicts just as well then you could add that into your data for training.

Im confirming it in my brain as I type, so I hope it makes sense.

:slight_smile:

1 Like

Hahaha thanks a lot then, yeah, I think it kind of make sens, as soon as I have time I will try to carry out some test in Kaggle competition and I will address the result here.

Thanks a lot !!!

1 Like

I’m running through the lectures again and I remember there being a section at the beginning of one of the lectures where they talk really in depth about the train test splits. If I see it I will let you know the time of the lecture. I know Rachel also wrote a blog post mainly about getting a good vlaidation set though I think…

Good luck !!!

yes, I remember that they talked about how to get a good validation set, and prior version of this course I remember they said that after being sure you are getting a working model you should add the val data to the train data.
The problem is that I do not remember them saying how to train the model without using the validation data

1 Like

Hi there,
there is a part in the deep learning book that explains two possible strategies for dealing with your problem. End of page 245 and page 246 at this link :
https://www.deeplearningbook.org/contents/regularization.html
Note that the authors are not fully enthusiastics about any of them, but at east, that gives a starting point, and I guess it’s nice to see people have already thought about your exact problem …