A tutorial for fine-tuning BERT in Fastai

I’ve been looking at this tutorial and other applications of BERT in document summarisation and the latest Jigsaw challenge for toxic comment classification. One important point that I see in all of these applications is that the BERT language model is not being fine-tuned on the corpus under test. So if we follow the process outlined by the figure below, people are essentially jumping from Step 1 (getting the pre-trained language model) to Step 3 (fine-tuning the classification model).

If I look at the ablation studies in the original ULMFiT paper (Tables 6 and 7), it seems that there is 1%-2% to be gained in error performance by fine-tuning the language model on the corpus before fine-tuning the classification model. In looking through the code available in the pytorch-pretrained-bert package, it does not seem to include any process for fine-tuning the language model. There seems to be an expectation that you will take the model as is.

2 Likes