How to prevent underfitting in NLP?

Yeah you should always try to overfit first. So reduce LR to 1e-3, remove all dropout, and set weight decay to 1e-7. Does that overfit? Once you get it overfitting, you can start adding more regularization etc.

12 Likes