CutMix > Mixup (?)

LessW2020 · June 11, 2019, 4:07pm

A paper from last month introduces CutMix, which they show superior performance vs Mixup and Cutout on a variety of datasets including Imagenet.

Cutmix = combininig two images but instead of overlaying based on opacity (mixup), they simply create a photo from two rectangles that are subsets of the original photos.

cutmix

github repo:

From the paper at least, CutMix outperforms Mixup across the board.
Thoughts?

mnpinto · June 11, 2019, 5:16pm

It would be interesting to try on Imagenette and Imagewoof, it doesn’t seem hard to implement, maybe with a Callback?

LessW2020 · June 11, 2019, 5:38pm

Yes, definitely. I may take a crack at this weekend after I get my presentation out of the way.

I would just leverage the Mixup code and modify for CutMix (as a callback, exactly as you noted):

https://github.com/fastai/fastai/blob/master/fastai/callbacks/mixup.py#L6

here’s the actual code from CutMix repository:

for i, (input, target) in enumerate(train_loader):

    # measure data loading time
    data_time.update(time.time() - end)

    input = input.cuda()
    target = target.cuda()

    r = np.random.rand(1)
    if args.beta > 0 and r < args.cutmix_prob:
        # generate mixed sample
        lam = np.random.beta(args.beta, args.beta)
        rand_index = torch.randperm(input.size()[0]).cuda()
        target_a = target
        target_b = target[rand_index]
        bbx1, bby1, bbx2, bby2 = rand_bbox(input.size(), lam)
        input[:, :, bbx1:bbx2, bby1:bby2] = input[rand_index, :, bbx1:bbx2, bby1:bby2]
        # compute output
        input_var = torch.autograd.Variable(input, requires_grad=True)
        target_a_var = torch.autograd.Variable(target_a)
        target_b_var = torch.autograd.Variable(target_b)
        output = model(input_var)
        loss = criterion(output, target_a_var) * lam + criterion(output, target_b_var) * (1. - lam)
    else:
        # compute output
        input_var = torch.autograd.Variable(input, requires_grad=True)
        target_var = torch.autograd.Variable(target)
        output = model(input_var)
        loss = criterion(output, target_var)

    # measure accuracy and record loss
    err1, err5 = accuracy(output.data, target, topk=(1, 5))

    losses.update(loss.item(), input.size(0))

Even · June 12, 2019, 2:28am

Very interesting. I wonder how / if it works for tabular data. I’ve been meaning to explore Mixup there, but I’m already using batchwise swap noise and all I’d need to change there are my targets.

Patrick · June 12, 2019, 3:25am

Hey Even,

Have you found good performance on tabular data with swap noise? I’m particularly interested in data augmentation techniques for tabular data myself.

Even · June 12, 2019, 6:39pm

I’ve used it quite effectively in a denoising autoencoder, but in terms of the fastai tabular model it wasn’t as effective as I’d hoped. It does seem to improve things somewhat, but it adds another hyperparameter to tune, so its not a clear win.

I’d love to hear about what other methods you’ve used or heard about in the past. My main role right now is developing preprocessing and models for tabular data and working to improve their performance on the GPU side.

Patrick · June 13, 2019, 3:36am

Cool. Yeah, I’ve had some success with DAEs and swap noise for categorical variables + random noise for continuous variables. Still nothing ground-breaking. I haven’t had time to put code to interpreters yet, but I’ve been ruminating a bit over augmenting training records with GANs. I think the same idea applies in CV but maybe it’s not worth the trouble because creating a good GAN by itself is so much trouble.

soboleiv · June 20, 2019, 4:02pm

Just tried it on food-101 (github).

Didn’t get much of a difference vs no augs on 224x224 / 1 cutmix’ed per batch of 128.

LessW2020 · June 20, 2019, 7:53pm

Thanks for posting your results!
Question - how are you controlling the % split between images? I read in another paper that makes a big difference.
I only quickly looked at your github and you have a probability input coming in but it’s not used?
I’m thus wondering if the cutmix could end up frequently being so little of one image vs another (say 10% and 90%) that the 10% is not useful to learn from…
Anyway, I’m planning to setup and use it for a project shortly and will see what results I get and will post.

soboleiv · June 22, 2019, 5:51pm

Problem was in my impl. Looks on-par with standard fast.ai aug results now

jeremy · June 24, 2019, 6:02pm

This looks very similar to RICAP

LessW2020 · June 25, 2019, 1:48am

wow, good call - CutMix is basically identical except Cutmix uses 2 images, Ricap uses 4…that’s about the only difference I see from a quick read of RICAP.

Guess that means I can write a new paper on my brand new, highly innovative “Trimix” where I use 3 images

jeremy · June 25, 2019, 3:42am

Originally invented here I believe: https://www.kaggle.com/c/state-farm-distracted-driver-detection/discussion/22627#129848

oguiza · June 26, 2019, 5:14pm

I’ve created a repo with callback implementations of ricap and cutmix.
They can be used in the same way as mixup.
I have tested the image transformations and seem to be ok.
I have only used them in a proprietary image dataset, but have not tested them on any other dataset.
If you want to use it, please feel free to do so.

Potential original cutmix issue:
When coding cutmix, I’ve realized there’s a potential issue with the calculation of λ (% of the image and label mixed).
The calculated area to mix falls partially outside the image sometimes, and it’s clipped out, but the λ value is not corrected. This means that the % of image modified and the % of label don’t match. I’ve added a parameter true_λ that fixes the issue (by default it’s now set to true, which means that the modified cutmix version is used. If you want to use the original cutmix version just use: cutmix(true_λ=False)

LessW2020 · June 26, 2019, 8:22pm

@oguiza - this looks fantastic! thanks for coding this up and sharing.

(You just saved me half a day as I had cutmix on my todo list, plus nice bonus of adding ricap which I had no plans to do lol.)

re: potential original cutmix issue - that’s a great catch, thanks for fixing.

I’ll be using both of these shortly for a project I’m working on.
Will be interesting to see if ricap does better/worse/same - I will try to run on imagenette for 10 epochs each to see if any diff can be spotted as a controlled test.

oguiza · June 26, 2019, 8:44pm

That’d be interesting, although I doubt you’ll find any improvement with so few epochs. Jeremy seems to have found an improvement using mixup for 80 epochs.

jeremy · June 26, 2019, 9:13pm

The bag of tricks paper uses mixup for 200 epochs on Imagenet BTW.

jeremy · June 26, 2019, 9:14pm

This looks great! Maybe include the notebook too so you could add some examples and explanation?..

oguiza · June 28, 2019, 12:10pm

Ok, I’ll create a notebook next week.

jeremy · June 28, 2019, 1:17pm

Ah sorry I assumed you already had one, since there’s a “do not edit” comment at the top saying it’s from a notebook. I guess it must just be carried over from another file.