Any tips for dealing with overfitting in language models?

wgpubs · November 27, 2017, 6:16pm

Besides dropout

I’ve thought about duplicating data or eliminating the validation dataset once a good enough model is achieved. Not sure if these are wise endeavors or if there are better ways (especially with smaller datasets).

Other ideas would be:

increasing/decreasing # of hidden layers
increasing/decreasing # of activations
increasing/decreasing embedding matrix
increasing/decreasing bptt

hiromi · November 27, 2017, 11:18pm

My notes from Geoffrey Hinton Coursera class says:

Ways to reduce overfitting:

More data
More layers
Weight-decay
Weight-sharing
Early stopping
Model averaging
Bayesian fitting of neural nets
Dropout
Generative pre-training

But that’s a class from 2013, so I’m sure there are newer techniques.