Chapter 10: why is there no unfreeze call before initial training of LM

BorutF · April 22, 2021, 5:03pm

Hello,

The language model is loaded than:

learn.fit_one_cycle(1, 2e-2)

This is called, even though the texts says the model automatically calls freeze.

Before the next fitting cycle unfreeze is called. Why is there a difference between the first and second call.

sambit · April 22, 2021, 5:33pm

The objective is to train the embeddings of new words (i.e., words which are part of IMDB but not part of Wikipedia) for 1 epoch without touching the pre-trained weights.

Hence, the pre-trained model’s weights are frozen & the model is trained for 1 epoch. This only trains the new embeddings.

Then, we unfreeze the model, and continue training it.

meanpenguin · April 22, 2021, 6:11pm

The Language Model (as well as many other models using pre-trained models) configured in the FastAI generally re built as at least two parts. Head (consisting of new un-trained layers) and Body (the layers of the pretrained model).

We run one training epoch with only the Body frozen - allowing the Head layers to be “initialized”/trained to non-random/zero values.

Then, the Body is unfrozen and the entire network is allowed to train

See: https://docs.fast.ai/tutorial.siamese.html#Train-the-model
and https://docs.fast.ai/vision.learner.html#create_cnn_model (This is for vision but applies to all models using pre-trained models)

sambit · April 22, 2021, 7:00pm

In the case of language model (AWD_LSTM in this case), a new head isn’t actually used. This is because the task remains the same (i.e., to predict the next word), and using the old head is better than attaching a randomly initialised head. Hence, when the model is frozen, only the randomly initialised embeddings of new words (i.e., words in IMDB but not in Wikipedia) are trained (since this is the only part of the model which is randomly initialised).

So what you’ve written is generally true. But for language model, it’s a bit different