Multi Class Classification with imbalanced data in ULMFit

lebrongame6 · August 12, 2020, 12:45am

Hi everyone! I have an imbalanced dataset where my majority class account for 60% and the other 9 labels are 40%. I am aware of over/undersampling but is there any other way to use ULMFit with this problem?

Other ways I am doing this with BERT is using class weights so importance is higher for minority classes. Can ULMFit incorporate something like that?

morgan · August 13, 2020, 9:57am

You could pass a weight to your loss to upweight the minority classes, should be a weight keyword in the loss func you’re using.

For more info on the weight keyword check out the PyTorch docs, e.g. for CrossEntropyLoss: https://pytorch.org/docs/master/generated/torch.nn.CrossEntropyLoss.html

This article also has a lot on imbalanced data: https://machinelearningmastery.com/random-oversampling-and-undersampling-for-imbalanced-classification/#:~:text=Random%20oversampling%20involves%20randomly%20selecting,them%20from%20the%20training%20dataset.