Can label smoothing be used for multi-label images?

I am a student who finished Part 1 and am interested in applying label smoothing to a problem. I saw that it was taught in Part 2 so figured this would be a good place to ask my question.

I was wondering though if label smoothing can be applied to multi-label problems. In addition, typically, as I read about label smoothing online, it seems that they usually are replacing the labels with the smoothed labels, but that is done in the loss function, correct?

See discussion in this thread: https://forums.fast.ai/t/is-label-smoothing-off-by-eps-n/44290

1 Like

Thanks so much for your response!

It seems based on this post it was possible to try label smoothing with multi-label:

However, he was saying it did not add to 1, which seems to be important to match it up with the probabilities.

Would it make sense to set the labels to \frac{1-\epsilon}{n} for those labeled 1 and \frac{\epsilon}{N-n} for those labeled 0 where n is the number of positive labels per data point?

In terms of loss for each data point with n labels that are one-hot encoded, it would be:

\sum_i\frac{(1-\frac{N-n}{N}\epsilon)}{n}(-\log(p_i)) + \sum_{j \neq i} \frac{\epsilon}{N}(-\log(p_j))

where i are the positive labels.

Does this seem correct?

1 Like

Coming back to this, I realized I didn’t simplify it like was done for regular multi-label. Here is the simplified version:
\begin{aligned} (1-\epsilon)\sum_i (-\frac{\log p_i }{n} ) + \frac{\epsilon}{N} \sum (-\log p_i) \end{aligned}

where the last term is the full cross entropy over the entire dataset.

I am unsure how to implement this. I see in the notebook there is a loss = reduce_loss(-log_preds.sum(dim=-1), self.reduction) and also nll = F.nll_loss(log_preds, target, reduction=self.reduction). The output seems to be lin_comb(loss/c, nll, self.ε) so that would be self.ε * loss/c + (1- self.ε)*nll

Is nll the cross-entropy of the entire dataset, because then shouldn’t it be multiplied by self.ε instead of (1- self.ε)?

1 Like

Did you ever manage to implement label smoothing on multi-label? Would love to see it if so :slight_smile:

Unfortunately not. I think I wasn’t able to get the math to work up in an intuitive sense like it did for single-label, and also I think I had some problems during empirical tests.

However, since a couple other people have also asked about this, I might look into this again soon.

2 Likes

Ah alright – I might do the same then. Let me know if you figure anything out!

Did you manage to implement label smoothing for multi-label images?

Hi,

I don’t understand why the labels should sum up to one in a multi-label scenario, as they are usually not summing up to one anyway. Further, can’t we simply modify the labels by a smoothing factor and use the new targets in a BCEWithLogitsLoss?

For example, assuming that the labels do not have to add up to one, we can modify the labels such that the 1’s become 1-e and the 0’s become 0+e. I implemented this logic in the following Callback (In my scenario I worked with a Callback rather than a new loss function because I wanted to compare the effect of label smoothing on my BCWithLogitsLoss on the validation set):

class MultiLabelSmoothingCallback(Callback):
    def __init__(self, learn:Learner, ε:float=0.05):
        super().__init__()
        self.ε = ε
    
    def before_backward(self, **kwargs):
        labels = self.yb[0].detach().clone()
        for i, x in enumerate(labels):
            labels[i] = torch.where(x==1.0, 1.0-self.ε, self.ε)
        smoothed_criterion = nn.BCEWithLogitsLoss()(self.pred, labels)
        return {‘last_loss’: smoothed_criterion}

WDYT?