Focal loss for a language model

Has anyone had success with using Focal Loss as to build a language model? With a large vocab, it would seem like it could really help to predict some of the less-frequent words in the language. If you have tried this out, please let me know. I am playing with it now, and will post results here (positive or negative) so that they can be tracked.

3 Likes

Hey Bobak, Have you able to get Focal Loss working for language modeling? could you share your result?

Thanks for the ping. I never got any positive results. It did not seem to work any better.
I am trying out label smoothing now for the LM and for the classifier. I don’t see it working better for the LM yet, but I have not exhausted all the options.

1 Like

Could you share the label smoothing snippet that you’re using?

Maybe relevant to label smoothing, I have a multi-label multi-class problem where the the correct labels for each sample is almost certain but the labels that doesn’t match with sample are not exact. I don’t know what should I do for my loss function to put less pressure on incorrect labels.

For example it’s like having multi-label for images but the annotators forgot to put all the correct labels for each image.

Here I write briefly about label smoothing.

This code should work well for you:

learn.loss_func = FlattenedLoss(LabelSmoothingCrossEntropy)
3 Likes