Training Loss > Validation Loss

This is something that’s come up a lot in Lecture 2. It is constantly said that training_loss > validation_loss = bad. Yet, throughout the lecture - training_loss is always larger than validation_loss! Even in the teddy bear example, this comes up. You can see it on the official course-v3 notebook for Lesson 2 here under the “Train model” heading.

I looked at the FastAI library and realized dropout was on for training but off during the validation phase. Okay! So, I turn off dropout by doing:

learn = cnn_learner(data, models.resnet34, metrics=error_rate, ps=0)

But, now I get a very different result using the same teddy bear example.

Here are my results w/ dropout on.

Here are my results w/ dropout off:

Even with dropout off, validation loss gets tiny as the training loss stays large.

Whats going on?

You’re saying ‘even with dropout off, validation loss gets tiny as the training loss stays large’. Dropout is a technique to counter overfitting, so ‘with dropout off’ you would actually expect what you’re describing to be more likely rather than less.

Also, I think models are often trained further even if training_loss > activation_loss. After all, the loss for each parts of the split is still going down individually. What you do not want to see is the validation loss going up while the training loss keeps going down, because this implies a model whose ability to generalize is sinking (as shown by the increasing validation loss, which is a measure of your model’s ability to generalize).

What exactly is going on in your example I don’t know. Maybe your validation set is very small, so your model just ‘got lucky’ in that it predicts correctly on those data points, or there some data leakage, for example same images occurring in both your training and validation sets.