Improving a Text Classification model

vishak · October 17, 2020, 12:48pm

So I’m trying to train a classifier and did the LM -> finetuned LM -> Classification model. I’m at 80% accuracy now. What else can I do now in fastai?

stefan-ai · October 17, 2020, 1:49pm

The first thing I would do is a thorough error analysis. Print out the model’s top losses and try to understand which errors it makes. Maybe you’ll find some mislabeled examples that you could clean up to improve performance.

Some other ideas of what to try next:

use subword tokenization instead of the default word-based spaCy tokenizer
maybe you can think of some additional text pre-processing that could help for your specific dataset (even though neural networks generally don’t need much pre-processing)
have a look at the blurr library which integrates huggingface transformers with fastai v2. You could try to fine-tune pre-trained transformer models using this framework.
data augmentation: e.g. MixUp or back-translation

vishak · October 18, 2020, 4:31am

Thanks a lot! Will try all of these out