Hi,
I am trying to use a language model which I have trained on wiki-103 to perform classification.
Everything appears to be working well if I follow the imdb example, fine tune the language model and load its encoder into the classification model. I get an f1 score of ~84%.
I then wanted to see what the result was if I didn’t fine tune the language model, that is if I use the wiki-103 language model directly. I tried loading it with
learn = text_classifier_learner(data_class, drop_mult=0.5)
learn.load_pretrained(‘best_model.pth’,‘dict.pkl’)
with the wiki-103 model trained using data_lm and saved as
learn_lm.save(‘best_model’)
and the dict saved as
pickle_out = open(path_lang_model/‘models/dict.pkl’,“wb”)
pickle.dump(data_lm.vocab.itos, pickle_out)
pickle_out.close()
but I get the following error, so I am assuming this is not the correct thing to do
RuntimeError: Error(s) in loading state_dict for SequentialRNN:
Missing key(s) in state_dict: "1.layers.0.weight", "1.layers.0.bias", "1.layers.0.running_mean", "1.layers.0.running_var", "1.layers.2.weight", "1.layers.2.bias", "1.layers.4.weight", "1.layers.4.bias", "1.layers.4.running_mean", "1.layers.4.running_var", "1.layers.6.weight", "1.layers.6.bias".
Unexpected key(s) in state_dict: "1.decoder.weight", "1.decoder.bias".
I then tried just loading the encoder but this fails because the size of the vocab is different, 30,004 for the language model and only 900 in my classification task.
What is the correct way to load a non fine tuned language model into a text_classifier_learner()?
If I don’t load anything then I still get an f1 score of ~80%. I am confused as to what is happening if the encoder is not loaded. If the model just uses default initialization then it appears to do pretty well. If the encoder is not loaded into the text_classifier_learner() I am assuming I am not using transfer learning, is this correct?