Class imbalance in a multi label scenario(ULMFit)

I have been trying to build a multi label text classifier for determining the tone of a given email. However most of the labelled samples have a neutral tone and that’s precisely the problem I am faced with. Is there any way to address this class imbalance. Thanks.

Any progress on this? I am also looking for a solution for Text Classification where labelled dataset is biased towards one label.

Nope. The closest I could come was to use ImbLearn. But it does not yield good results

I have been facing the same issue with imbalanced dataset. In particular, I was trying to predict the rating (label) from amazon reviews. The classifier was really bad for groups with small reviews … :(.

No idea how, this can be fixed! :roll_eyes::roll_eyes::roll_eyes::roll_eyes:

I have heard of this loss function call focal loss. Maybe that would be of some help

1 Like

What about data augmentation? This article gives an explanation of how you can increase the amount of training data using those techniques. Here’s the library if you want to try it