Sorry for making a new topic about this, but the discussion is scattered in various threads.
My take away from the lessons I’ve completed so far is sort of split between two camps:
- You want train loss > validation loss (under fit) or = validation loss (perfectly fitting.) As soon as validation loss > training loss, you are over fitting.
- As long as your accuracy on the validation is still increasing, you’re good to continue training.
Can someone help disambiguate for me?
It’s the 2nd. (1) is definitely not true - I actually explicitly made that case a couple of lessons ago. But I do agree many folks on the forum have incorrectly claimed it.
Jeremy – thank you so much for taking the time to respond. Turns out I’ve been prematurely ending training phases for a while. Appreciate it.
Jeremy actually explicitly made that case in lesson 2 [00:50:30]. Text transcript for that part of the video:
Too many epochs create something called “overfitting”. If you train for too long as we’re going to learn about it, it will learn to recognize your particular teddy bears but not teddy bears in general. Here is the thing. Despite what you may have heard, it’s very hard to overfit with deep learning. So we were trying today to show you an example of overfitting and I turned off everything. I turned off all the data augmentation, dropout, and weight decay. I tried to make it overfit as much as I can. I trained it on a small-ish learning rate, I trained it for a really long time. And maybe I started to get it to overfit. Maybe.
So the only thing that tells you that you’re overfitting is that the error rate improves for a while and then starts getting worse again. You will see a lot of people, even people that claim to understand machine learning, tell you that if your training loss is lower than your validation loss, then you are overfitting. As you will learn today in more detail and during the rest of course, that is absolutely not true.
Any model that is trained correctly will always have train loss lower than validation loss.
That is not a sign of overfitting. That is not a sign you’ve done something wrong. That is a sign you have done something right. The sign that you’re overfitting is that your error starts getting worse, because that’s what you care about. You want your model to have a low error. So as long as you’re training and your model error is improving, you’re not overfitting. How could you be?
Regarding that, when we are training the model, we train it in stages, say for in vision related task, we train it like:
- fine-tune for the top layers keeping the bottom layers frozen.
- unfreeze and train for the whole network
- Increase image size and train again… and so on
Do we need to always make sure the model is not under-fitting in each of the stages, or do we just need to make sure of that in stages like stage-2 (when the model has been unfrozen).
Sgugger’s blog post about the LR finder really helps clarify alot of this, sort of en passant.