FastaiV1 imdb_scripts


Hello everyone, i’ve implemented Ulmfit from the imdb_scripts. I would like to use fastaiV1 now that it is available, but it doesn’t work from scratch. For example, some scripts import from fastai.learner which was part of the old fastai but not the new one. I assume, there is some part of the new fastai in which the imported functions are located, but so far, i’ve had no luck in finding them. Is there a documentation/guidelines to update your code to work with fastaiV1?
Thanks in advance.

(Brian Muhia) #2

You only need to run, since that generates the train.csv and valid.csv files. Once you have those, and want to train a language model from scratch, read the csvs into dataframes, add a label column

trn_df = pd.read_csv(DATA/'train.csv', header=None)
val_df = pd.read_csv(DATA/'valid.csv', header=None)
trn_df['labels'] = [0]*len(df)
df = trn_df[['labels', 0]]
df.to_csv(DATA/'lm_train.csv', index=False, header=None)
df = val_df[['labels', 0]]
df.to_csv(DATA/'lm_valid.csv', index=False, header=None)

Full example, to pretrain a language model on the French Wikipedia:


This is not my issue. My issue is to do the whole process that was previously done with the scripts in fastaiV1. For example, in order to finetune the lm, you used to call LanguageModelData on LanguageModelLoader, but LanguageModelData is deprecated in fastaiV1, and you have to use some new functions from fastaiV1 to replace it. I’m currently changing my code by hand, but it’s taking me quite some time to find what new classes from fastaiV1 replaces an old classes from fastai V0.7, and so the whole point of this thread was to find out if there was a “guide” to speed up this process, or someone who had implemented it already who could tell me what the big changes to do are.


After some try to adapt the code, i decided it was easier to redo it from scratch. The good thing is that the documentation of fastai.text show us this very example, so it was not too hard. So if anyone wants to use Ulmfit with fastaiV1, just go to and you should be good to go.


Do you mind sharing your code? I’m also trying to reproduce the training for the full IMDB dataset, but without success so far.