I assume you all already know what Jeremy, and many other prominent scholars teach us about overitting in neural networks. It may be summarized as follows: As long as your accuracy keeps improving, don’t care about your TL getting below (even significantly below) your VL (see below for a more in depth discussion if you forgot the lesson).
Of course, I strongly believe this is true. But let us assume that we have two models trained over the same data, achieving the same level of accuracy/error rate.
One model, though, has its TL still above the VL (or even below, but not by much).
On the other hand, the second model got a TL significantly below the VL, or even close to zero. Maybe since you used a bigger network, maybe since you meddled with dropout and stuff, maybe since we kept training a bit longer than necessary.
What I want to ask is: will the first model have a better generalization capacity?.
For example, if we use that model in production over data which possess quite a different underlying distribution w.r.t the train/validation data, will it perform better than our second model?
I’m asking this since I worked over “real” data with that philosophy in mind, achieving awesome accuracies with models having their TL a lot below the VL. As the model was used against test images in the same domain, but with different shooting conditions, that accuracy worsened a lot.
(*) Long version:
So the only thing that tells you that you’re overfitting is that the error rate improves for a while and then starts getting worse again. You will see a lot of people, even people that claim to understand machine learning, tell you that if your training loss is lower than your validation loss, then you are overfitting. As you will learn today in more detail and during the rest of course, that is absolutely not true .
Any model that is trained correctly will always have train loss lower than validation loss.
That is not a sign of overfitting. That is not a sign you’ve done something wrong. That is a sign you have done something right. The sign that you’re overfitting is that your error starts getting worse, because that’s what you care about. You want your model to have a low error. So as long as you’re training and your model error is improving, you’re not overfitting.