Where is the original ULMfit entire training code

stephen13 · March 25, 2021, 5:27pm

I’ve been trying to search the original ULMfit code that was used to train the LM that got publihsed in the paper.
I just get refered everywhere to Jeremys 2018 DL foundations video series for ULMfit.

However I found out that many implementations like STLR, Discriminative fine tuning etc aren’t implemented there . I feel like it’s the demo version of ULMfit. And the not the original model with all the actual hyperparams that actually broke all sota back in its day.

Also in the end Jeremy scrolling through the code also mentions that this isn’t the actual ULMfit model. The actual one was training differently

Can someone link me the actual ULMfit code that was actually used to publish the paper

florianl · March 25, 2021, 8:04pm

Hi Stehen,

as far as I can see most (all?) of the techniques mentioned in the paper are implemented in fast.ai.

Slanted triangular learning rates -> learn.fit_one_cylce()
Discriminative fine tuning -> learn.fit_one_cycle(learn.fit_one_cycle(10, --> slice(lr/100,lr) <–)

You are right - you’ll find different hyper parameters and fine tuning approaches but in my experience the default fast.ai implementation are really solid to still reach sota for classification tasks. Depending on the dataset you get similar results as with transformers - sometimes slightly better - sometimes slightly worse.

I recently trained a couple of language models and run some experiments on different datasets and use a quite straight forward approach.

SentencePiece Tokenizer, 15k vocab size, pretraining on about 110-160k Wikipedia articles (depending on the language and charset), learning rates found by lr_find (~lr_min). I am sure there’s room for improvement (hyper parameters, freezing layers, etc …) but that works really well in my experience.

What experiences did you make with ULMFiT so far?

Florian

mkardas · March 27, 2021, 5:31pm

Hi Stephen,
I think you’re looking for https://github.com/fastai/fastai1/tree/master/courses/dl2/imdb_scripts, which was created around v0.6 of the fastai library, though it might be difficult to pinpoint the exact commit used in the paper. However, I would recommend working with more recent versions of the library. Here’s an example of using v1.0.53 to get the exact same result (see the last cell) as in the paper: https://github.com/fastai/fastai1/blob/master/examples/ULMFit.ipynb and here is an example using v2: https://github.com/fastai/fastai/blob/master/nbs/examples/ulmfit.ipynb (the result is lower as it’s only a forward model and not forward+backward ensemble as in the v1 example).