I am traning the model and I have tried few different learning rates but my validation loss is not decrasing. below is the learning rate finder plot:
And I have tried the learning rate of 2e-01 and 1e-01 but stil my validation loss is fluctation after few cycles (not like as Jeremy mentioned in 3 lecture it increases and then decreases). Below is my validation loss history:
Can you provide que training a validations plots losses? and also de size of your training and validation datasets.
This way, we could see your model condition (overfitted, underfitted, few training examples, etc …)
The training set has 20289 images and the validation set has only 5072 images.
Sorry, I can’t provide the plot since I have shut down the kernel, so for that I will need to retrain the model which will take around 2 hours.
The link to my kaggle notebook is https://www.kaggle.com/karanchhabra99/humpback-whale-identification , hope this helps.
According to what you’ve provide, I think that yout model is overfitted. You can notice this by seing the extrememly low training losses and the high validation losses. That is one thing …
The other, is when you see that behavior in validation losses, one can say that gradient descent is not converging (up’s and down’s … as yours) due to a large learning rate …
So to overcome the problem what learning rate should I choose?
Like I have tried learning rate of 1e-01 and 2e-01. Or should I reduce the number of cycles?
Yes … the more epochs you have (not always) the more chances you have that your model is to be overfitted…
While decreasing the epochs, try to explore lower learning rates (1e-1 and 2e-1 can be considered high LR in general for most models) …
Let me know how you go …
I have trained the model with a new learning rate of slice(1e-02) and the result are as follows:
Now I am thinking to choose a new learning rate based on lr_find() and again fit few more cycles.
Yes … you should. As a principle of having a healthy model, your training losses shoul be lower than the validation’s.
Use lr_find() to get a range of LR to apply to your model. Consider that Jeremy’s default LR es 1e-03 (as far as I remember)
I still think your LR es rather high