Focal loss for a language model

Has anyone had success with using Focal Loss as to build a language model? With a large vocab, it would seem like it could really help to predict some of the less-frequent words in the language. If you have tried this out, please let me know. I am playing with it now, and will post results here (positive or negative) so that they can be tracked.


Hey Bobak, Have you able to get Focal Loss working for language modeling? could you share your result?

Thanks for the ping. I never got any positive results. It did not seem to work any better.
I am trying out label smoothing now for the LM and for the classifier. I don’t see it working better for the LM yet, but I have not exhausted all the options.

1 Like

Could you share the label smoothing snippet that you’re using?

Maybe relevant to label smoothing, I have a multi-label multi-class problem where the the correct labels for each sample is almost certain but the labels that doesn’t match with sample are not exact. I don’t know what should I do for my loss function to put less pressure on incorrect labels.

For example it’s like having multi-label for images but the annotators forgot to put all the correct labels for each image.

Here I write briefly about label smoothing.

This code should work well for you:

learn.loss_func = FlattenedLoss(LabelSmoothingCrossEntropy)