Loss, over fit, under fit

Sorry for making a new topic about this, but the discussion is scattered in various threads.

My take away from the lessons I’ve completed so far is sort of split between two camps:

  1. You want train loss > validation loss (under fit) or = validation loss (perfectly fitting.) As soon as validation loss > training loss, you are over fitting.
  2. As long as your accuracy on the validation is still increasing, you’re good to continue training.

Can someone help disambiguate for me?

1 Like

It’s the 2nd. (1) is definitely not true - I actually explicitly made that case a couple of lessons ago. But I do agree many folks on the forum have incorrectly claimed it.

5 Likes

Jeremy – thank you so much for taking the time to respond. Turns out I’ve been prematurely ending training phases for a while. Appreciate it.

1 Like

Jeremy actually explicitly made that case in lesson 2 [00:50:30]. Text transcript for that part of the video:

Too many epochs create something called “overfitting”. If you train for too long as we’re going to learn about it, it will learn to recognize your particular teddy bears but not teddy bears in general. Here is the thing. Despite what you may have heard, it’s very hard to overfit with deep learning. So we were trying today to show you an example of overfitting and I turned off everything. I turned off all the data augmentation, dropout, and weight decay. I tried to make it overfit as much as I can. I trained it on a small-ish learning rate, I trained it for a really long time. And maybe I started to get it to overfit. Maybe.

So the only thing that tells you that you’re overfitting is that the error rate improves for a while and then starts getting worse again. You will see a lot of people, even people that claim to understand machine learning, tell you that if your training loss is lower than your validation loss, then you are overfitting. As you will learn today in more detail and during the rest of course, that is absolutely not true.

Any model that is trained correctly will always have train loss lower than validation loss.

That is not a sign of overfitting. That is not a sign you’ve done something wrong. That is a sign you have done something right. The sign that you’re overfitting is that your error starts getting worse, because that’s what you care about. You want your model to have a low error. So as long as you’re training and your model error is improving, you’re not overfitting. How could you be?

15 Likes

Regarding that, when we are training the model, we train it in stages, say for in vision related task, we train it like:

  1. fine-tune for the top layers keeping the bottom layers frozen.
  2. unfreeze and train for the whole network
  3. Increase image size and train again… and so on

Do we need to always make sure the model is not under-fitting in each of the stages, or do we just need to make sure of that in stages like stage-2 (when the model has been unfrozen).

2 Likes

Sgugger’s blog post about the LR finder really helps clarify alot of this, sort of en passant.

1 Like

I can’t follow this logic. In case of overfitting the error rate is also improving, isn’t it? Only a test set could make a statement about overfitting here?

Please let me know where my thinking goes wrong. Thanks!

Hi @pors,

In case of overfitting the error rate is also improving, isn’t it? Only a test set could make a statement about overfitting here?

Nope, remember that the error rate (or any metric) is only computed on the validation set, not on the train set. So in case of overfitting this is exactly what is getting worse (for error rate → starts to increase).

When you are overfitting, the loss on the training set is still getting better (decreasing), but the loss on the validation set and the metric computed on the validation set are getting worse.

1 Like

Ah yes, of course. This is very helpful! Can I then assume that the best version of my model is the one where the error_rate is lowest, before it starts increasing?

So in this case, the best model would be after 3 epochs?

indeed, epoch 3 and 4 are quite similar. You could perhaps try adding some more augmentation / regularization to see whether that improves the error_rate

Augmentation doesn’t seem to help. What does help, is more images and larger images (Resize to a larger format).

I don’t know what regularization is, but I’ll go find out. Thanks!