Results:
- LSTM language model: 4 epochs, 3.140521 for val loss and 0.376913 accuracy. Perplexity was thus 23.1038
- QRNN language model: 7 epochs, 3.193912 for val loss and 0.367543 accuracy. Perplexity was thus 24.2884
Pre-trained models can be found here along with the itos file: https://drive.google.com/open?id=1CZftqrMg-MRH9yXV7FRBv6J_NOtBiK-2
I decided to train the LM on fastai v1 myself. I ended up using G Cloud services and taking advantage of their 300 USD credits. This allowed me to set up a V100 instance and just train there. Using QRNNs resulted in ~30 mins per epoch. LSTMS were around ~1:00 per epoch. I used a wiki dump and generated a 100M training set, with a 30k vocab. All this to say there’s definitely room for improvement and anyone could go ahead and improve these results.
Shoutouts to @sgugger for guiding me along the way and fixing a bug just in time for me train.
If someone could do some baseline testing with this LM, that’d be sweet.