In lesson 1 “Review: easy steps to train a world-class image classifier”, the last step says that we should keep training until overfitting. However in practice I found sometimes I can’t reach the point of overfitting and the performance got stuck.
For example, on an image binary classification problem, I used “densenet201”, after training for a while, my losses are (0.23, 0.27) ((train loss, val loss), negative log likelihood as crit). However if I keep training, the train loss will decrease (but not stable) and the val loss increases. This may be a sign of overfitting, but it’s probably not because with “resnet50” I got (0.15, 0.17), so my “densenet201” model hasn’t reaches its full potential.
I’m using Adam and at this stage I used very small learning rates already (~1e-7). I’ve been using differential learning rates, 1cycle policy, and following every tips I can find in here.
Do you have any suggestions on what I can do here? Any hint what the problem might be? Should I keep training with extremely low learning rates (< 1e-8)? I don’t think “densenet201” can’t achieve the same performance as “resnet50”…
Thank you very much!