Lesson 4, why decreasing learning rate helped?

In lesson 4, Jeremy built a simple CNN

Then he started to train it (I’ve set the video above to exactly this moment => 1h6m17s, please watch for 20 seconds), and saw that the training set accuracy is growing, but the validation is decreasing.
After 5 epochs of training, he got 0.59 training set accuracy, 0.1088 validation set accuracy.
This can mean only one thing - OVER FITTING !!!

when you over fit, you have several options:

  1. increase training set
  2. decrease number of parameters
  3. the dropout …

But from some strange reason, Jeremy decided to reduce the learning rate, And even stranger, it worked.

I don’t understand why, reducing learning rate should help the model to converge and fit better the training set, it is not mentioned as one of the solutions to overfitting. why Jeremy used it ? why did it work ?

I agree that this is a form of overfitting. Although I’d be careful with your “this can only mean one thing” statement. It could also mean something is seriously wrong with the validation data.

That said, it depends a little how you define convergence. Is it any form of improvement on the training set? Then yes, this model converged. May it be better to judge it based on the validation data? Probably, and then it never converged in the first place.

Every batch changed the weights by such a magnitude that the network found it simpler to memorize the training data instead of understanding the concept of the task. Lowering the rate only nudged it in the right direction and patterns may have become visible.

but i’d also like to hear Jeremys take on this. maybe i’m totally off.