Incremental learning in fastai?

Currently I created the NLP classification model using test_classifier_learner and saved the model for future use. Now I can read any new data and run prediction without problem using load_learner and get_preds.

The question is following:

  1. If we obtain new set of data we want to use to train the model, do we have to train the model from scratch? Or can we train from the previously trained model with additional data like continuous online learning?
  2. How do we store vocab from first trained data and reuse it later for improving the model?

Thanks anyone for reading this post!

Regarding 1: You don’t need to train the model from scratch. You can simply fine-tune your text classifier by basically following the same process as fine-tuning a text classifier from a pre-trained language model, with the difference that you woud fine-tune a pre-trained text classifer on new data. Just make sure to load the entire pre-trained model after creating the learner (not only the encoder as for language models).

Regarding 2: When you save the databunch that the text classifier was trained on, you are also saving the corresponding vocab. So when you want to fine-tune a new classifier, you can load the previously saved databunch and pass in the old vocab when creating the new databunch, e.g. data_new = (TextList.from_folder(path, vocab=data_old.vocab)...)