In lesson 4, Jeremy built a simple CNN
Then he started to train it (I’ve set the video above to exactly this moment => 1h6m17s, please watch for 20 seconds), and saw that the training set accuracy is growing, but the validation is decreasing.
After 5 epochs of training, he got 0.59 training set accuracy, 0.1088 validation set accuracy.
This can mean only one thing - OVER FITTING !!!
when you over fit, you have several options:
- increase training set
- decrease number of parameters
- the dropout …
But from some strange reason, Jeremy decided to reduce the learning rate, And even stranger, it worked.
I don’t understand why, reducing learning rate should help the model to converge and fit better the training set, it is not mentioned as one of the solutions to overfitting. why Jeremy used it ? why did it work ?