I’ve managed to solve my problems from above by following the hyperparams discussed in https://forums.fast.ai/t/training-transformerxl/40104.
I am currently achieving 29.99 perplexity with a 60k vocab, on transformerXL, using the fastai v1 library. I’ve trained for 14 epochs so far.
The notebook (currently in bare bones) can be found here: https://github.com/cduguet/ulmfit-es/blob/master/ULMFit-TransfXL.ipynb
EDIT: I exported the learner (including weights) for inference. It is available in this link