I’ve been trying to search the original ULMfit code that was used to train the LM that got publihsed in the paper.
I just get refered everywhere to Jeremys 2018 DL foundations video series for ULMfit.
However I found out that many implementations like STLR, Discriminative fine tuning etc aren’t implemented there . I feel like it’s the demo version of ULMfit. And the not the original model with all the actual hyperparams that actually broke all sota back in its day.
Also in the end Jeremy scrolling through the code also mentions that this isn’t the actual ULMfit model. The actual one was training differently
Can someone link me the actual ULMfit code that was actually used to publish the paper
You are right - you’ll find different hyper parameters and fine tuning approaches but in my experience the default fast.ai implementation are really solid to still reach sota for classification tasks. Depending on the dataset you get similar results as with transformers - sometimes slightly better - sometimes slightly worse.
I recently trained a couple of language models and run some experiments on different datasets and use a quite straight forward approach.
SentencePiece Tokenizer, 15k vocab size, pretraining on about 110-160k Wikipedia articles (depending on the language and charset), learning rates found by lr_find (~lr_min). I am sure there’s room for improvement (hyper parameters, freezing layers, etc …) but that works really well in my experience.