 # Can label smoothing be used for multi-label images?

#1

I am a student who finished Part 1 and am interested in applying label smoothing to a problem. I saw that it was taught in Part 2 so figured this would be a good place to ask my question.

I was wondering though if label smoothing can be applied to multi-label problems. In addition, typically, as I read about label smoothing online, it seems that they usually are replacing the labels with the smoothed labels, but that is done in the loss function, correct?

0 Likes

(Jeremy Howard (Admin)) #2

See discussion in this thread: https://forums.fast.ai/t/is-label-smoothing-off-by-eps-n/44290

1 Like

#4

Thanks so much for your response!

It seems based on this post it was possible to try label smoothing with multi-label:

However, he was saying it did not add to 1, which seems to be important to match it up with the probabilities.

Would it make sense to set the labels to \frac{1-\epsilon}{n} for those labeled 1 and \frac{\epsilon}{N-n} for those labeled 0 where n is the number of positive labels per data point?

In terms of loss for each data point with n labels that are one-hot encoded, it would be:

\sum_i\frac{(1-\frac{N-n}{N}\epsilon)}{n}(-\log(p_i)) + \sum_{j \neq i} \frac{\epsilon}{N}(-\log(p_j))

where i are the positive labels.

Does this seem correct?

1 Like

#5

Coming back to this, I realized I didn’t simplify it like was done for regular multi-label. Here is the simplified version:
\begin{aligned} (1-\epsilon)\sum_i (-\frac{\log p_i }{n} ) + \frac{\epsilon}{N} \sum (-\log p_i) \end{aligned}

where the last term is the full cross entropy over the entire dataset.

I am unsure how to implement this. I see in the notebook there is a loss = reduce_loss(-log_preds.sum(dim=-1), self.reduction) and also nll = F.nll_loss(log_preds, target, reduction=self.reduction). The output seems to be lin_comb(loss/c, nll, self.ε) so that would be self.ε * loss/c + (1- self.ε)*nll

Is nll the cross-entropy of the entire dataset, because then shouldn’t it be multiplied by self.ε instead of (1- self.ε)?

1 Like

(Zach Eberhart) #6

Did you ever manage to implement label smoothing on multi-label? Would love to see it if so 0 Likes

#7

Unfortunately not. I think I wasn’t able to get the math to work up in an intuitive sense like it did for single-label, and also I think I had some problems during empirical tests.

However, since a couple other people have also asked about this, I might look into this again soon.

1 Like

(Zach Eberhart) #8

Ah alright – I might do the same then. Let me know if you figure anything out!

0 Likes