PolyLoss in fast.ai

Hi guys:

I just implemented PolyLoss here

and here is an tutorial of how to use it in fast.ai
it is super easy to use, just create a PolyLossFlat class and use it to replace cross entropy loss

class PolyLossFlat(BaseLoss):
    "Same as `PolyLoss`, but flattens input and target."
    y_int = True
    @use_kwargs_dict(softmax=True, ce_weight=None, reduction='mean',epsilon=1.)
    def __init__(self, **kwargs): super().__init__(PolyLoss, **kwargs)
    def decodes(self, x):   
      return x.squeeze(1).argmax(dim=1)
    def activation(self, x): return F.softmax(x, dim=1)

PolyLoss is a general framework and can natural can naturally subsume cross-entropy loss - the paper just came out last week and it is fantastic (read here [2204.12511v1] PolyLoss: A Polynomial Expansion Perspective of Classification Loss Functions). if anyone is interested to test it out, it would be interested to hear your results

Thanks!

YiYi

12 Likes

@ilovescience, @KevinB, and I have been playing with it in the discord to mixed results, but still doing more experiments

3 Likes

Really nice implementation, I tried implementing it here, but I think your version is much more complete:

3 Likes

Also, in case if you want to quickly hack it into your pipeline, it shouldn’t be too difficult for Poly-1 loss. See a screenshot from the paper. (Reminded me of the paper on Focal loss, which was also seemingly minor but impactful change.)

Hey Kevin:

I read your post about the accuracy difference in resnet18 - that was awesome!

did you try different values of epsilon? my results from initial experiments for various classification task are similar to yours - I’m able to see a small margin increase in accuracy in almost all case. Based on the paper, PolyLoss is able to help correct the confidence of predictions and seems drive a larger performance increase in less balanced datasets - I will do more experiments and hopefully have something interesting to share:)

1 Like

I have tried a few epsilons, but I don’t have anything conclusive yet

Here are my results on training a model from scratch on 256x256 image from ImageNette for 200 epochs. I followed the training recipe from here. Accuracy scores of 5 runs are reported:

Regular (with LabelSmoothingCrossEntropy loss function):
0.960255, 0.957452, 0.955924, 0.957452, 0.958217
Mean=0.957860
Std=0.00141

Epsilon=1
0.957962, 0.960000, 0.956943, 0.961529, 0.959490
Mean=0.959185
Std=0.00160

Epsilon=2
0.955159, 0.959490, 0.954904, 0.959236, 0.957962
Mean=0.957350
Std=0.001964

Epsilon=5
0.951847, 0.949045, 0.953121, 0.952102, 0.953631
Mean=0.951949
Std=0.001592

Epsilon=-1
0.958217, 0.955924, 0.959490, 0.959236, 0.955414
Mean= 0.957656
Std=0.001685

Seems like Epsilon=+1 is the best on ImageNette and shows slight gain over ImageNette. I might test for fine-tuning in the future as well.

3 Likes

I will try one of the Dataset that I keep getting the same or similar Test performance results. I think this is better testing on this loss function because the test result sometimes is unstable. I will report here later.

Updated:
It looks like it is only helpful for an unbalanced dataset. But I still need to do more testing. I tested on my example, and I do not see a great improvement “at the end”. However, I saw training accuracy increased very quickly between the first epoch and the second in the beginning. I can’t say that because It still needs more testing.

they mentioned in the paper poly-1 does not show any improvement over label smoothing in their experiments. I wonder if it would make sense to use label smoothing with PolyLoss together? one changes the distribution of target, one change the distribution of prediction…