Why is it that jeremy mentions train_loss should be less than validation loss else will lead to overfitting?

Actually, I am still a bit confused with the concept of training loss and validation loss.I tend to believe that the training loss should always be less than validation loss since the model very well knows about the training data and based on these learnings it generalizes on a new data and validation dataset is an unknown entity and hence,it is supposed to give a higher error rate than the train_loss.But,as Jeremy pointed out that it’s actually contrasting to what I believe and that this leads to overfitting as well.Can someone explain this?