Regression using Fine-tuned Language Model

shaun1 · November 17, 2018, 3:57pm

I was able to successfully fine-tune a LM using the pre-trained model with the datablock API on a custom dataset. I highlight the (small number of) steps here for documentation:

Assuming, our data is in a pandas dataframe with just different fields that need to be added to the text:

# my dataset consists of name and item_description
data_lm = (TextList.from_df(texts, PATH, cols=['name', 'item_description']) 
          .random_split_by_pct(0.1)
          .label_for_lm() # this does the tokenization and numericalization
          .databunch())

data_lm.save('lm-tokens')

# load the data (can be used in the future as well to prevent reprocessing)
data_lm = TextLMDataBunch.load(PATH, 'lm-tokens')
data_lm.show_batch() # take a look at the batch fed into the GPU

learn = language_model_learner(data_lm, pretrained_model=URLs.WT103, drop_mult=0.5, callback_fns=ShowGraph)
learn.lr_find()
learn.recorder.plot()
learn.fit_one_cycle(1, 1e-2, moms=(0.8,0.7))
learn.recorder.plot_losses()
learn.save('fit-head')

learn.load('fit-head')
learn.unfreeze()
learn.lr_find()
learn.recorder.plot()

learn.fit_one_cycle(11, 1e-3, moms=(0.8,0.7))

With a dataset size of 5,635,745, it took me 21 hours, 22 minutes, and 6 seconds to run this on a V100 with a final training loss of 2.697805, valid loss of 2.571279, and accuracy of 0.524987.