How to handle imbalanced NLP data set

I am working on a data-set with around 2000 records.

Around 80% records have their own labels.

There are around 200 categories, some categories got more than 20 records; whereas others only have TWO…

Considering this is a text data-set, so I cannot do the up-sampling for minority categories with techniques like what I could do for images.

So what can I do for it?