CutMix > Mixup (?)

oguiza · June 28, 2019, 1:51pm

Yes, I used a nb to build the code, but it doesn’t contain anything else.
I can add some examples to help visualize how each of the callbacks work, and run some tests to check whether they improve performance or not. I’m interested in that myself, and I’m happy to share.

oguiza · July 15, 2019, 4:20pm

.ricap() and .cutmix() now available

I’m sorry for the delay, but I’ve been busy with some other stuff.
I’ve eventually created a notebook that contains examples of how ricap and cutmix can be used. You just need to add .ricap() or .cutmix() to learn in the same way we do with mixup.

I’ve also created some functionality to visualize how single-image transforms (flip, rotate, etc) or the multi-image transforms (mixup, ricap and cutmix). I think it really helps if you can see the output of these transformations.

I’ve run some very brief tests on Imagenette to check the callbacks work correctly and performance was: recap > cutmix > mixup. I just have a single GPU and it takes considerable time to run the tests, so bear this in mind, as there are very few runs.

I’d say though that both ricap and cutmix seem to be very competitive with mixup (if not slightly better), so it may be worth trying them.
Something interesting is that the impact on time performance is negligable.

You can find the notebook and required code in fast_extensions, where I’m planning to share some additional fastai code that I’m creating.

Please, feel free to use this code as you wish. I’d be interested to know if you use ricap or cutmix and get any performance improvement.

LessW2020 · July 15, 2019, 7:12pm

This is really, really great work @oguiza!

I’m building a new CNN today for work and will put ricap to use.

More interesting, I will try and run all three to 20 epochs while adjusting lr rather than fixed - this way lr is not an issue in limiting their potential during testing. I held off on earlier testing b/c of this as if you use a fixed learning rate for the entire time, how do you know if you held one back vs another it was too high for, etc.

Thanks also for the visualizer for the multi-images, that’s a big help. Anyway, I’ll be putting it to use today and thanks for making this notebook and impl.

LessW2020 · July 15, 2019, 11:13pm

Ricap + Progressive Sprinkles in action - @oguiza, your code is working great!

ilovescience · July 16, 2019, 1:47am

Amazing work! I would recommend putting this in the fastai library through a pull request. That way we can use it much easier.

ilovescience · July 16, 2019, 1:52am

Also, is it possible to adjust the sizes of the cuts? I know cutout in fastai allows you to do so, and it would be helpful to have such control for RICAP and especially CutMix. For example, you don’t want to make cuts larger than the most important feature and accidentally replace the object of interest in your image. So controlling the size would be helpful to avoid situations like that.

LessW2020 · July 16, 2019, 2:44am

I agree with this issue - I’ve seen some images where only a sliver is showing and doubt it’s enough to contribute learning.

ilovescience · July 16, 2019, 3:00am

I see that your progressive sprinkles method helps solve that issue for cutout. Very interesting work!

LessW2020 · July 16, 2019, 3:58am

I just ran 20 epochs each on ImageNette using all four data-augmentations (thanks to @oguiza’s great work).
Thus: Cutmix / Mixup / Ricap / Progressive Sprinkles
Model: XResNet50, True Wd, Relu activation, bs=50, size = 128
LR = dynamic - Of interest though, I used my own impl of a dynamic LR tuner, with the idea that testing using a fixed LR could favor one or the other…this way, it’s using the loss landscape as feedback to adjust LR and not constraining it.

Here’s the summary:
Best accuracy, 20 epochs:

88.6 - progressive sprinkles
86.6 - cutmix
86.4 - mixup
84.6 - ricap (note - very steady progress)

I’m now going to test progressive sprinkles + each …
update:
cutmix + sprinkles = 87%
mixup + sprinkles = 83.6%
ricap + sprinkles = 82%

I had hoped that combining sprinkles + one of the major ones would be additive, but it clearly was not.

I am surprised that sprinkles did so well, so I’m going to do some more work on that.

DeepBlender · July 16, 2019, 9:59am

The sprinkles idea is awesome and makes intuitively so much sense. I am definitely going to experiment extensively with it (likely with more diverse shapes/intensities).

In the case of mixup, did you apply sprinkles on the mixed image or did you apply sprinkles to both images individually and then mix them? I would expect the latter one to be more promising, but I am not absolutely certain which one you used.

morgan · July 16, 2019, 10:39am

This thread is gold, thanks @oguiza and everyone else contributing!

oguiza · July 16, 2019, 12:54pm

There is a already a way to adjust patch sizes.

In the case of ricap, there is a hyperparameter (beta):

The hyperparameter β determines the distribution of bound-
ary position. If β is large, the boundary position (w, h) tends to be close to the center of a patched image and the target class probabilities c often have values close to 0.25. RICAP encounters a risk of excessive label smoothing, discouraging correct classification. If β is small, the boundary position (w, h) tends to be close to the four corners of the patched image and the target class probabilities c often have 0 or 1 probabilities. Especially, with β = 0, RICAP does not augment images but provides original images. With β = 1.0, the boundary position (w, h) is distributed uniformly over the patched images.

They found beta=.3 to provide best results in their tests (this is the default value, although it can be changed between (0-1).

For cutmix, the hypeparameter that controls how much regularization you want to apply is alpha. The authors indicate that the parameter that gave them the best result is 1 (meaning that patch sizes will be evenly distributed between 0-50% of the original image size). This is the value set by default in my implementation, although you may want to modify it to any other value between (0, 1).

So, in summary, in both implementations the patch size is not always the same, it varies randomly according to a distribution that you can modify.

LessW2020 · July 16, 2019, 2:37pm

I tested one more intuition, I’ll call it rotational augmentation and got the highest result of all.
Namely, I did 5 epochs Mixup (it’s the fastest starter), 5 Progressive Sprinkles, 5 CutMix and finished with Ricap (beta=.5 to try and push larger sections).

I did it 2x and got 90% and 89.6% respectively.

Thus, while blending things like sprinkles + cutmix didn’t pan out, rotating them through seems to produce a more robust CNN with highest accuracy.
Next highest as a standalone was progressive sprinkles (88.6%).

It’s more work to keep swapping out augmentations, but doing this rotational aug seems to be the winner overall here.

oguiza · July 16, 2019, 7:12pm

Wow! I like your ‘progressive sprinkles’ and ‘rotational augmentation’ ideas. Very creative! Thanks a lot for sharing.
I believe that data augmentation is an area where there’s still a lot of potential to improve performance.
So, please, @LessW2020 keep up with your great work!!
Your ideas have inspired me, and I’m also working on a couple of simple ideas that I may be able to share tomorrow if they work as expected.
I’m curious to know how your rotational augmentation works from a loss perspective, as you will need to modify it every time you change the augmentation type, correct?

DrHB · July 17, 2019, 12:34am

Thanks for your insights. Can you please explain a little bit what is the dynamic LR ?

LessW2020 · July 17, 2019, 4:26am

Thanks @oguiza for the nice feedback! I agree completely with you re: data augmentation is a very underexplored domain.
Re: loss function - yes that has to change with each data aug change to keep it all in sync.
Will definitely look forward to any new ideas you have!

LessW2020 · July 17, 2019, 4:32am

Hi @DrHB,
I’m using a dynamic/automated LR finder that I melded into the FastAI framework (well it breaks the framework but I have some modified files to accomodate it)- different than the usual FastAI setup.

I’m still working with it but the main point to use it in testing like this is to try and let the augmentation guide the lr rate that’s best for it, rather than the more typical fixed"LR= blah" setting we see in most papers. If the LR is fixed, then it may favor one aug over another simply b/c that fixed LR better matches the ideal aug lr rate rather than what works best for each augmentation basically.
I’ll try and redo my testing using the FastAI LR finder as well.
Another partially automated one was released recently (optimistic AMSgrad) that I’ll try and test out too.
Hope that helps1

LessW2020 · July 17, 2019, 5:39am

So for at least a single 20 epoch test, I was able to beat the leaderboard using progressive sprinkles - 92.6% vs current leaderboard of 92%!
(only one run so can’t claim victory yet but exciting regardless).

The changes were two fold:
1 - Incremented the prob of any image having sprinkles every 2 epochs, not 5 (i.e. .1%, then .2%, etc).
2 - Incremented batch size every 2 epochs. (started small at 30, then 40, etc) as another form of regularization.

Tomorrow I hope to try the 2 epoch change with the rotational augmentation and see if that also improves.

DrHB · July 17, 2019, 12:32pm

Thank you very much for your response. It will be nice to compare your approach to fast ai fit_one_cycle

GiantSquid · August 1, 2019, 8:20pm

Does anyone have an intuition for why it doesn’t cause a problem that many of the images produced by mixup and variants don’t make sense from a human vision perspective? There seems to be a general sense that augmentation should produce things that look like real examples, which mixup totally violates, yet it works great!