AWD_LSTM + Transformer

Hi there, i have a quick question please, when i use AWD_LSTM in my language_model_learner, I save the encoder and then i use this classifier, classifier= text_classifier_learner (data_clas, Transformer, drop_mult=0.2)

it doesn’t work… and i gt this error :

Missing key(s) in state_dict: “pos_enc.weight”…
Unexpected key(s) in state_dict: “encoder_dp.emb.weight”…
size mismatch for encoder.weight: copying a param with shape torch.

Is it because they both have a different architecture? one is RNN and the second based on regular NN with attentions layers, or it is because of something else?
Thank you.

Yes, if you want to fine-tune a text classifier from a pre-trained language model both need to have the same architecture (except the task-specific head of course). The encoder weights from the pre-trained AWD-LSTM language model are not compatible with the Transformer architecture you want to use as a downstream text classification model.

1 Like

Thank you Stefan.