Customized Loss function with weights for Multi-label multi-classification

Which loss function is used by the fast ai multi-label multi-classification lesson?

  1. nn.BCEWithLogitsLoss or
  2. nn.MultiLabelMarginLoss

If 2, then it means we can’t use class weights to handle class imbalance problem. Has anyone used nn.BCEWithLogitsLoss with class weights to handle the class imbalance problem in multilabel classification? If so, I would like to know if it worked for you.

I have tried the following but I get dimension errors.

_, class_counts = np.unique(data.y.items, return_counts=True)
weights = np.sum(class_counts)/class_counts
weights = tensor(weights).float().cuda()
learn.loss_func.func = nn.BCEWithLogitsLoss(pos_weight=weights)

I get the following error:
RuntimeError: The size of tensor a (576) must match the size of tensor b (21) at non-singleton dimension 0

1 Like

Hi @shbkan!
It looks like your model’s last layer is generating bigger tensor than it is supposed to generate.
if you have made your own model by nn.Sequential(…), then check the size of tensor it is generating at the last layer.

I don’t think the error is due to that. Error only comes when I set the loss function to this customized version. Its something with the way the custom function is set to learn.loss_func.func

Hey @shbkan!

Then I think this loss function is flating out the last tensor but not flating the label tensor.
I have faced this error and I am pretty sure that this error is due to the mismatch between the tensors.

Hi @shbkan, @RushabhVasani24 – I have the same problem that @shbkan described.

Did you solve the problem regarding tensor mismatch, and how exactly?

The code I used for testing:

pos_weight = torch.ones([data_clas.c]).float().cuda()
learn = text_classifier_learner(data_clas, AWD_LSTM, drop_mult=0.5, wd=0.1, 

And then when training (or calculating learning rate), the same error with tensor sizes occurs.

I have not figured it out yet. I tried a few different things but could not make it work. I hope someone explain using customized loss function with fastai in general because I see a few posts where people are struggling with it. It is an important topic as well because most of the real-life datasets are not balanced so we need to use weight-sensitive loss functions.

1 Like

I am experimenting with nn.BCEWithLogitsLoss as well at the moment and have a multi-label dataset with 44 possible classes. If you run

_, class_counts = np.unique(data.y.items, return_counts=True)

I am sure you will see a length of 576. It seems that class_counts does not only store single classes but also bundles of possible classes or different combinations of multi-label classes. You can check that with np.unique(data.y.items) which will show you results looking like list([41, 43, 44]). Hence you have to use another way of generating your weights. In my case I only had one such class with one observation, so I simply excluded it, I am sorry that I can’t share any more useful code with you.

On a side note I think you need to change weights = np.sum(class_counts)/class_counts to weights=class_counts/np.sum(class_counts) for the correct weight.


Also interested if there’s any loss functions in fastai for multi-label problems which allow weighting


Anyone found one yet? I also need a multi-class loss function.

You could use nn.BCEWithLogitsLoss to train your model; the only downside is Learner.predict will give you raw outputs (without the sigmoid activation and thresholding).

loss_func = nn.BCEWithLogitsLoss(pos_weight=pos_weight)
learn = cnn_learner(...)
learn.loss_func = loss_func

The issue is caused by BaseLoss's (the parent class that all fastai loss functions subclass) initialiser having flatten=True as the default option and not being exposed in BCEWithLogitsLossFlat's initialiser.

1 Like

Opened a git issue for this with a proposed fix:


This issue is resolved now. Thanks @rsomani95

So now if we want to provide a loss function for imbalanced Multi-label classification

loss_func = BCEWithLogitsLossFlat(pos_weight=pos_weight)

Since pos_weight is provided as argument, fastai then will set flatten=False based on


This would make the loss for well represented classes even higher… that’s would not be correct!