Major new changes and features

sgugger · June 10, 2019, 11:47pm

v1.0.53 is live

Breaking change: the default embedding size in the AWD LSTM has changed from 1150 to 1152. Why? Because 8 is the magic number and we need multiple of eights to take full advantage of mixed precision training. With just this change and making sure the vocab size is a multiple of 8, pretraining a model on Wikitext-103 takes 6 hours instead of 14 to 20. FIne-tuning on IMDB takes one hour instead of 3 (as long as you have a modern GPU)

New exciting things: a backward pretrained model (demonstrated in this example reproducing the 95.4% on IMDB from ULMFit) and an experimental sentence piece tokenizer.