What if not use language model encoder for text classifier?

nancyC · September 6, 2019, 3:23pm

Hi guys, I’m building a text classifier, so according to the course, we need:

build a language model, learn_lm = language_model_learner(data_lm, AWD_LSTM, drop_mult=0.3)
save its encoder, learn_lm.save_encoder(‘encoder’)
build a text classifier, learn_clas = text_classifier_learner(data_clas, AWD_LSTM, drop_mult=0.5)
load language model’s encoder, learn_clas.load_encoder(‘encoder’)

I’m wondering what if I don’t build language model and not use encoder for classifier? which means I only do step 3, will classifier build its own encoder to understand sentences? Or we have to build a language model before build a text classifier?

Thank you for any clues!

darek.kleczek · September 6, 2019, 5:50pm

I think that if you don’t fine-tune a language model on your domain-specific dataset, and go directly to step 3 (building a classifier), then it will by default still use a standard language model (and encoder) that was pretrained on wikipedia. You will likely need to train it for longer and you might not achieve as good quality as with finetuning.

You can experiment with it - the code to start with is here: https://docs.fast.ai/text.html

nancyC · September 6, 2019, 6:02pm

Thank you so much~ I did the experiment on my own dataset, it turns out: yes training time is longer, but the accuracy when NOT building language model and NOT using its encoder is much higher, which makes me not sure whether it is RIGHT to only do step 3 (build classifier) ?

darek.kleczek · September 6, 2019, 6:16pm

I would assume then that the standard wikipedia pre-trained model is better generalizing to your test set than your fine-tuned language model. Most often fine-tuning should produce better results.

nancyC · September 6, 2019, 7:35pm

Uh I see, thank you a lot !