How to skip language model fine tuning?

I’ve noticed that with the IMDB example from the course (end of lesson 3 and beginning of lesson 4) that avoiding any additional training on the language model results in a drop of accuracy (no pre-training - .829 accuracy vs with pre-training accuracy of .926) that can quickly be over come by training for the specific classification task. I’ve done some other work with the MIMIC dataset (specifically the NOTEEVENTS data ) and seen similar results (though in my case the accuracy was 0.871 without pre-training and 0.910 with pretraining).

Language model training is computationally expensive - especially when working with a larger dataset. As shown in lesson 3, training the language model takes about 3.5 hours. Language model training with a random 10% sample from mimic takes 12+ hours. Seems that the time spent language model building would be better spent elsewhere.

What I’d like to do is just go straight from the ULMFit wikitext-103 trained model to my text_classifier_learner. Is that possible? What I’ve done is essentially deleted the cells with learn.fit_one_cycle until I get to the cells with text_classifier_learner. Seems like there are a lot of extra steps left that I’d like to avoid if it is possible.

More specificially

learn = text_classifier_learner(data_clas, AWD_LSTM, drop_mult=0.5)

Requires one to first save an encoder, which requires one to first create a langauge_model_learner which requires on to first create a dataset specific data bunch, etc.

Compare my version with the course version

Alternatively if it is simply crazy to suggest not fine tuning/training a language model first, please help me put my head on straight.

Not sure I’ve got this right, but here’s the steps I think are required - and leaving out steps that are not required. This is using the same IMDB example from the class so should be familiar with those who have completed through Lesson 3.

If this version is correct, the drop in accuracy is much greater than I saw when just skipping the learn.fit_one_cycle() steps, but still creating a vocab and encoder

Key changes here were to leave out the vocab= parameter for the TextList.from_folder() call and not running the line learn.load_encoder('fine_tuned_enc')

from fastai.text import *
path = untar_data(URLs.IMDB)
data_clas = (TextList.from_folder(path)
             #grab all the text files in path
             #split by train and valid folder (that only keeps 'train' and 'test' so no need to filter)
             .label_from_folder(classes=['neg', 'pos'])
             #label them all with their folders
data_clas = load_data(path, 'data_clas.pkl', bs=bs)
learn = text_classifier_learner(data_clas, AWD_LSTM, drop_mult=0.5)
learn.fit_one_cycle(1, 2e-2, moms=(0.8,0.7))

With the first frozen learn.fit_one_cycle() the accuracy is 0.50 which is very low. Of course, training for more epoch improves results. With 2 frozen and 5 unfrozen epochs, or about 60 minutes on Google Cloab, accuracy rises to 0.89 which is quite close to the starting point if you do language model fine tuning first (but requires less total time than language model fine tuning). I didn’t attempt to train for about 4 hours as shown in the course results, so the accuracy of this wt103 only version is not close to the version shown in Lesson 3.

Source code also available here in notebook that anyone should be able to run directly via Google Colab:

Be sure to change runtime to have a GPU (not sure if that will happen automatically) See for more details.