CutMix > Mixup (?)

This looks very similar to RICAP

6 Likes

wow, good call - CutMix is basically identical except Cutmix uses 2 images, Ricap uses 4…that’s about the only difference I see from a quick read of RICAP.

Guess that means I can write a new paper on my brand new, highly innovative “Trimix” where I use 3 images :slight_smile:

7 Likes

Originally invented here I believe: https://www.kaggle.com/c/state-farm-distracted-driver-detection/discussion/22627#129848

4 Likes

I’ve created a repo with callback implementations of ricap and cutmix.
They can be used in the same way as mixup.
I have tested the image transformations and seem to be ok.
I have only used them in a proprietary image dataset, but have not tested them on any other dataset.
If you want to use it, please feel free to do so.

Potential original cutmix issue:
When coding cutmix, I’ve realized there’s a potential issue with the calculation of λ (% of the image and label mixed).
The calculated area to mix falls partially outside the image sometimes, and it’s clipped out, but the λ value is not corrected. This means that the % of image modified and the % of label don’t match. I’ve added a parameter true_λ that fixes the issue (by default it’s now set to true, which means that the modified cutmix version is used. If you want to use the original cutmix version just use: cutmix(true_λ=False)

14 Likes

@oguiza - this looks fantastic! thanks for coding this up and sharing.

(You just saved me half a day as I had cutmix on my todo list, plus nice bonus of adding ricap which I had no plans to do lol.)

re: potential original cutmix issue - that’s a great catch, thanks for fixing.

I’ll be using both of these shortly for a project I’m working on.
Will be interesting to see if ricap does better/worse/same - I will try to run on imagenette for 10 epochs each to see if any diff can be spotted as a controlled test.

That’d be interesting, although I doubt you’ll find any improvement with so few epochs. Jeremy seems to have found an improvement using mixup for 80 epochs.

The bag of tricks paper uses mixup for 200 epochs on Imagenet BTW.

2 Likes

This looks great! Maybe include the notebook too so you could add some examples and explanation?.. :slight_smile:

Ok, I’ll create a notebook next week.

Ah sorry I assumed you already had one, since there’s a “do not edit” comment at the top saying it’s from a notebook. I guess it must just be carried over from another file.

Yes, I used a nb to build the code, but it doesn’t contain anything else.
I can add some examples to help visualize how each of the callbacks work, and run some tests to check whether they improve performance or not. I’m interested in that myself, and I’m happy to share.

1 Like

.ricap() and .cutmix() now available

I’m sorry for the delay, but I’ve been busy with some other stuff.
I’ve eventually created a notebook that contains examples of how ricap and cutmix can be used. You just need to add .ricap() or .cutmix() to learn in the same way we do with mixup.

I’ve also created some functionality to visualize how single-image transforms (flip, rotate, etc) or the multi-image transforms (mixup, ricap and cutmix). I think it really helps if you can see the output of these transformations.

I’ve run some very brief tests on Imagenette to check the callbacks work correctly and performance was: recap > cutmix > mixup. I just have a single GPU and it takes considerable time to run the tests, so bear this in mind, as there are very few runs.

I’d say though that both ricap and cutmix seem to be very competitive with mixup (if not slightly better), so it may be worth trying them.
Something interesting is that the impact on time performance is negligable.

You can find the notebook and required code in fast_extensions, where I’m planning to share some additional fastai code that I’m creating.

Please, feel free to use this code as you wish. I’d be interested to know if you use ricap or cutmix and get any performance improvement.

16 Likes

This is really, really great work @oguiza!

I’m building a new CNN today for work and will put ricap to use.

More interesting, I will try and run all three to 20 epochs while adjusting lr rather than fixed - this way lr is not an issue in limiting their potential during testing. I held off on earlier testing b/c of this as if you use a fixed learning rate for the entire time, how do you know if you held one back vs another it was too high for, etc.

Thanks also for the visualizer for the multi-images, that’s a big help. Anyway, I’ll be putting it to use today and thanks for making this notebook and impl.

1 Like

Ricap + Progressive Sprinkles in action - @oguiza, your code is working great!

1 Like

Amazing work! I would recommend putting this in the fastai library through a pull request. That way we can use it much easier.

1 Like

Also, is it possible to adjust the sizes of the cuts? I know cutout in fastai allows you to do so, and it would be helpful to have such control for RICAP and especially CutMix. For example, you don’t want to make cuts larger than the most important feature and accidentally replace the object of interest in your image. So controlling the size would be helpful to avoid situations like that.

I agree with this issue - I’ve seen some images where only a sliver is showing and doubt it’s enough to contribute learning.

I see that your progressive sprinkles method helps solve that issue for cutout. Very interesting work!

1 Like

I just ran 20 epochs each on ImageNette using all four data-augmentations (thanks to @oguiza’s great work).
Thus: Cutmix / Mixup / Ricap / Progressive Sprinkles
Model: XResNet50, True Wd, Relu activation, bs=50, size = 128
LR = dynamic - Of interest though, I used my own impl of a dynamic LR tuner, with the idea that testing using a fixed LR could favor one or the other…this way, it’s using the loss landscape as feedback to adjust LR and not constraining it.

Here’s the summary:
Best accuracy, 20 epochs:

88.6 - progressive sprinkles
86.6 - cutmix
86.4 - mixup
84.6 - ricap (note - very steady progress)

I’m now going to test progressive sprinkles + each …
update:
cutmix + sprinkles = 87%
mixup + sprinkles = 83.6%
ricap + sprinkles = 82%

I had hoped that combining sprinkles + one of the major ones would be additive, but it clearly was not.

I am surprised that sprinkles did so well, so I’m going to do some more work on that.

6 Likes

The sprinkles idea is awesome and makes intuitively so much sense. I am definitely going to experiment extensively with it (likely with more diverse shapes/intensities).

In the case of mixup, did you apply sprinkles on the mixed image or did you apply sprinkles to both images individually and then mix them? I would expect the latter one to be more promising, but I am not absolutely certain which one you used.

1 Like