I’m having difficulty understanding what to take away from these results. I did a search for learning rate using different weight decays to determine what lr and what wd to use. I then train for n epochs.
With more epochs accuracy and validation error continue to improve… BUT training loss gets much smaller. This must be overfitting right?, but validation is still improving… What should I take away from this and how should I decide on number of epochs?
Thanks for any help!
With 4 epochs i get:
If I recreate the learner again and run for 8 epochs:
With 15 epochs:
We can see that training one cycle with a larger number of epochs produces better validation loss and accuracy but that training loss is diminishing toward zero. Validation loss is not trending up but training loss is clearly trending down