Optimising class LanguageModelLoader()

I am posting from my previous post from here.

Fine-tuning the LM takes an extraordinarily long time with a decreased efficiency. Please checkout this notebook. I think this was run nearly a 20 days ago. If you scroll down to the first epoch (after lr_find), you can see it took a little over 2 hours. And after running for 11 epochs which took about 28 hours, the accuracy is aboud 0.583.

However, during my latest run (about 10 days ago), a single epoch took over 12 hours and the full 11 epochs took over 5 days! And the accuracy was nearly 13 points lower (0.449). I used the same dataset with the same exact code. Everything was run in one session (ie, no closing the notebook and loading in the data again).

Everything was run on a system equipped with a V100 (16G of video RAM) and 376G of memory with the latest dev version of FastAI (I always do git pull; pip install -e.[dev] before I start my work). Any idea whats going on?