Besides dropout
I’ve thought about duplicating data or eliminating the validation dataset once a good enough model is achieved. Not sure if these are wise endeavors or if there are better ways (especially with smaller datasets).
Other ideas would be:
- increasing/decreasing # of hidden layers
- increasing/decreasing # of activations
- increasing/decreasing embedding matrix
- increasing/decreasing bptt