Add pre-trained word embeddings from a text file to text_classifier_learner

yashuseth · December 26, 2018, 8:18pm

I want to train a text classifier without using a pre-trained language model. I only want to use pre-trained word embeddings. I have the word embeddings in a text file.

I see there is an option to specify weights to the language_model_learner using the pretrained_fnames. But this parameter is not available for the text_classifier_learner.

Is there a way to specify the word embedding from a text file instead of starting from randomly-initialized embeddings.

Also, I am not able to figure out how exactly to use the pretrained_fnames parameter for the language_model_learner. Is there any small working example for this.

Thanks in advance for any help.

rohit_gr · December 27, 2018, 12:02am

I think what you can do is simply:
-> Create learn with text_classifier_learner
-> Load the pretrained text file and create the required dictionary(dict) by looking at the layer names of learn.model
-> Use learn.model.load_state_dict(dict)

According to the code pretrained_fnames seems to be the list of filenames for pretrained_language_model_weights(.pth) and itos_pretrained_dataset(.pkl)