Also, in case if you want to quickly hack it into your pipeline, it shouldn’t be too difficult for Poly-1 loss. See a screenshot from the paper. (Reminded me of the paper on Focal loss, which was also seemingly minor but impactful change.)
I read your post about the accuracy difference in resnet18 - that was awesome!
did you try different values of epsilon? my results from initial experiments for various classification task are similar to yours - I’m able to see a small margin increase in accuracy in almost all case. Based on the paper, PolyLoss is able to help correct the confidence of predictions and seems drive a larger performance increase in less balanced datasets - I will do more experiments and hopefully have something interesting to share:)
Here are my results on training a model from scratch on 256x256 image from ImageNette for 200 epochs. I followed the training recipe from here. Accuracy scores of 5 runs are reported:
I will try one of the Dataset that I keep getting the same or similar Test performance results. I think this is better testing on this loss function because the test result sometimes is unstable. I will report here later.
Updated:
It looks like it is only helpful for an unbalanced dataset. But I still need to do more testing. I tested on my example, and I do not see a great improvement “at the end”. However, I saw training accuracy increased very quickly between the first epoch and the second in the beginning. I can’t say that because It still needs more testing.
they mentioned in the paper poly-1 does not show any improvement over label smoothing in their experiments. I wonder if it would make sense to use label smoothing with PolyLoss together? one changes the distribution of target, one change the distribution of prediction…