Improving a Text Classification model

So I’m trying to train a classifier and did the LM -> finetuned LM -> Classification model. I’m at 80% accuracy now. What else can I do now in fastai?

The first thing I would do is a thorough error analysis. Print out the model’s top losses and try to understand which errors it makes. Maybe you’ll find some mislabeled examples that you could clean up to improve performance.

Some other ideas of what to try next:

  • use subword tokenization instead of the default word-based spaCy tokenizer
  • maybe you can think of some additional text pre-processing that could help for your specific dataset (even though neural networks generally don’t need much pre-processing)
  • have a look at the blurr library which integrates huggingface transformers with fastai v2. You could try to fine-tune pre-trained transformer models using this framework.
  • data augmentation: e.g. MixUp or back-translation

Thanks a lot! Will try all of these out