Train loss much higher than valid loss?

I’m wondering if there’s an issue in the notebook lesson2-download.ipynb, associated with part 1 of the course Practical Deep Learning for Coders, v3. Specifically, associated with the video at In one of the markdown cells in the section ‘Learning rate (LR) too low’, I am seeing the following:

The text in the cell, starting with ‘Previously we had this result’ (suggesting these were results from a training run with an appropriate learning rate), was there when I first opened the notebook. I didn’t write it, and I didn’t edit it at all. Yet we see a train_loss that is substantially larger than valid_loss in all cases.

But why should train_loss be (an order of magnitude) larger than valid_loss? In the video, this isn’t questioned by the audience.

I don’t know about this particular case, but things like this happen if:

  • your validation set contains samples very close to some of the train set
  • your validation set is simpler (e.g. you oversample hard cases for training)
  • using any form of dropout may result in validation loss (vhen dropout is not applied) less than train loss (when dropout is applied)
  • applying heavy augmentations to train data but not validation

But usually the difference is not as dramatic.