Cool. I have a question about training on a small NLP dataset.
Is there a way to increase dropout after a certain number of epochs?
I find that after training a language model on a different dataset, it overfits after awhile. If I increase dropout in the original architecture before any training, then it doesn’t converge optimally. Loss is still decreasing, so I want to train further, and don’t want to overfit.
Here’s what it looks like,
[ 0. 3.93047 3.74086]
[ 10. 3.50935 3.50461]
[ 11. 3.42893 3.45722]
[ 12. 3.32166 3.42504]