Why does "train-loss > val-loss" mean low learning rate?

From what I understand it’s the following.

The model trains on the training set, and the training loss is calculated on that training set. So this loss is calculated on the same dataset the model was trained on and should thus be low.
However, the model has never seen the validation set, so when the validation loss is calculated it is on new data for a model only trained on the training set.

Hence the training loss ought to be lower than the validation set, because the model was fitted on the same data than the training loss is calculated on whereas it has never seen the validation data. If that’s not the case, something is not quite right with the training. Maybe the learning rate was too low, or maybe the model just needs a few more epochs.

Is that clearer ?

2 Likes