I tried some experiments using mixup and label smoothing on a large image classification dataset. Since it was large, I decided to only run 5 epochs and compare. For both cases, it did not improve training and validation loss compared to without mixup and label smoothing, nor did it improve accuracy.
However, it seems that the benefits of mixup may only be apparent when running for a longer period of time, as mentioned here. The same may be true with other tricks like label smoothing. Therefore, how much should I rely on these results in deciding whether to use tricks like mixup and label smoothing? Are there better ways to judge the effectiveness of these tricks on my dataset without running the full training? Also, what have your experiences been using these tricks been in the context of running small number of epochs vs large number of epochs?
Thanks. Right now I am training with a seed, so all the results are deterministic. I would do 20 runs but my dataset is relatively large and it takes 1-2 hours to do 5 epochs. However, I will retry mixup.
I was going to say the same. Base on my testing on food-101, with label smoothing and mixup helped push the accuracy like 2% with same image size without these two tricks.
If you want, you can check my test here. Glad I didn’t delete the repo…
I meant 20 runs of 5 epochs if you wanted to get the full story on Imagewoof, since you were asking about running small number of epochs vs large number of epochs, and I gave you what happens with 100 epochs.