Testing mixup and label smoothing on a dataset for only 5 epochs

ilovescience · September 17, 2019, 10:23pm

I tried some experiments using mixup and label smoothing on a large image classification dataset. Since it was large, I decided to only run 5 epochs and compare. For both cases, it did not improve training and validation loss compared to without mixup and label smoothing, nor did it improve accuracy.

However, it seems that the benefits of mixup may only be apparent when running for a longer period of time, as mentioned here. The same may be true with other tricks like label smoothing. Therefore, how much should I rely on these results in deciding whether to use tricks like mixup and label smoothing? Are there better ways to judge the effectiveness of these tricks on my dataset without running the full training? Also, what have your experiences been using these tricks been in the context of running small number of epochs vs large number of epochs?

Seb · September 18, 2019, 1:51am

On Imagewoof-128px (using the 160 px’ version) with an xresnet18 trained for 100 epochs I had 85.76% accuracy without Mixup and 86.36% with Mixup.

This was over enough runs (23 and 15 respectively) to make the result statistically significant (p<0.05 at least) given the variance.

You can test on 5 epochs, averaging over 20 runs, with and without mixup and see what happens. This repo should do the trick.

ilovescience · September 18, 2019, 2:22am

Thanks. Right now I am training with a seed, so all the results are deterministic. I would do 20 runs but my dataset is relatively large and it takes 1-2 hours to do 5 epochs. However, I will retry mixup.

heye0507 · September 18, 2019, 3:57am

I was going to say the same. Base on my testing on food-101, with label smoothing and mixup helped push the accuracy like 2% with same image size without these two tricks.

If you want, you can check my test here. Glad I didn’t delete the repo…

github.com

heye0507/dl_related/blob/master/fellowship/Res50 with Mixup labelSmoothing.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Introduction\n",
    "\n",
    "In this notebook, mixup and label smoothing is applied in compairsion with the previous resnet-50 final model. Where we can see that the top-1 accuracy increases with less training time. \n",
    "\n",
    "Please note that the model is only trainined with 224 image sizes, with top-1 accuracy beyong 87%, where the resnet-50 model will need another 6 hours training on P100 GPU to reach 88% accuracy with same data / argumentation (besides mixup) \n",
    "\n",
    "Also, the mixup - label smoothing model is still under fit, it can also be trained longer for better top 1 accuracy. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],

This file has been truncated. show original

Seb · September 18, 2019, 11:33am

I meant 20 runs of 5 epochs if you wanted to get the full story on Imagewoof, since you were asking about running small number of epochs vs large number of epochs, and I gave you what happens with 100 epochs.