Just finish reading this paper https://arxiv.org/abs/1812.01187
Wondering if anyone have tried to implement those tricks in fastai/pytorch? especially some of training tricks like label smoothing, Knowledge distillation and Mixup Augmentation. If not, I am actually interested in giving it a shot.
Do I understand it correctly that loss_func = LabelSmoothingCrossEntropy() is all we have to do to implement label smoothing? I.e. we don’t have to additionally change the labels before training, right?