Super convergence(ish) on wikitext-2

aayushy · July 16, 2018, 12:10pm

Ah, I see. So accuracy and loss are still a good approximation of the language model performance during training, but for more concrete results perplexity is what matters.

Also, thanks for the script! An Adam-W implementation was much needed.