after watching lessons 1-3 I decided to try a task on my own. I found this dataset of around 1k labeled captchas.
After some ideas that did not work, I finally got the model to predict >97% of the captchas in the validation set correctly. It was an interesting exercise as I tried different modelling approaches for the problem. I definitely want to write a blog post about the whole procedure.
However, before I do that I need to figure out 2 things:
The train loss is 4 times higher than the validation loss. According to the lecture (3 I think), this mean my model underfits. Is this really the case, or are there other reasons this might happen? If it does underfit, how would I try to fix this in my scenario?
When I train the frozen model, the validation loss goes down. I’m not predicting a single captcha correctly though. I think that’s because of the way I set the model up (I started explaining this in the notebook), it’s not unexpected.
Unfreezing the model eventually leads me to above 97% prediction accuracy.
BUT: I need a full 60 epochs of training for that. This strikes a as a bit odd. Normally I’d say training for so many epochs only produces overfitting. But since the validation loss keeps going down, that doesn’t seem to be the case.
I also tried increasing the learning rate, but that didn’t help either (still took so many epochs and produced worse results).
So the question here is: Why do I need such a large number of epochs for training?
The kernel is here. I’d appreciate any input on this
Thanks in advance,