Class imbalance in a multi label scenario(ULMFit)

prithviraj23 · July 21, 2019, 3:20am

I have been trying to build a multi label text classifier for determining the tone of a given email. However most of the labelled samples have a neutral tone and that’s precisely the problem I am faced with. Is there any way to address this class imbalance. Thanks.

celiberate · September 3, 2019, 2:17pm

Any progress on this? I am also looking for a solution for Text Classification where labelled dataset is biased towards one label.

prithviraj23 · October 17, 2019, 4:20am

Nope. The closest I could come was to use ImbLearn. But it does not yield good results

Preka · November 1, 2019, 2:45pm

I have been facing the same issue with imbalanced dataset. In particular, I was trying to predict the rating (label) from amazon reviews. The classifier was really bad for groups with small reviews … :(.

No idea how, this can be fixed!

prithviraj23 · November 17, 2019, 5:38pm

I have heard of this loss function call focal loss. Maybe that would be of some help

newvick · November 17, 2019, 11:04pm

What about data augmentation? This article gives an explanation of how you can increase the amount of training data using those techniques. Here’s the library if you want to try it