Data augmentation: dynamic blend

This is a powerful abstraction for image data augmentation that generalizes MixUp, Cutout, CutMix, RICAP, and more, and allows annealing of the amount of augmentation during training (to support curriculum learning) - Jeremy on Twitter

I’ve been thinking lately about the new types of image data transformation (random erasing, cutout, mixup, ricap, cutmix, etc.) and have come up with a single transform that can achieve many types of similar transformations. I call it Blend, as I think that is what most of this transformations have in common. They all blend pixels in a certain area with something else.

Many of the ideas I’ve integrated in it come from the original cutout, mixup, ricap and cutmix, as well as the great CutMix > Mixup (?) and Progressive Sprinkles (cutout variation) - my new data augmentation & 98% on NIH Malaria dataset threads.

I’d like to all participants in those threads, and specially @LessW2020 for creating those threads and sharing lots of creative ideas in them, @rwightman for his ‘frosted sprinkles’ idea, and Jeremy for bringing up ricap (I wasn’t aware of it).

I wanted to learn more about fastai and callbacks (although I know they will be changed soon) and have created a notebook to explain how to use this transform.

Also, related to this I have created a transform scheduler that allows to modify any parameters of the transform during training. It’s the TfmScheduler. This also stems from an idea @LessW2020 shared in the threads above. I’ve always been intrigued by this and wanted to learn how to code it.

I have run a few tests (you will see that there are endless possibilities) and have seen some promising results. But I don’t have the hardware to really test this in depth as it takes a considerable amount of time in my case.

I have added it to the fastai_extensions repo that I built a few days ago (repo)

I hope some of you may find this useful. In any case it’s been a great learning experience.

PS. I don’t have a CS background (I have a pharmacy degree), so apologies if the code is not at the right level.

Edit: I’ve added above a great definition of what this code is made by Jeremy on Twitter.


Wow @oguiza - this looks incredible!!

I’m going to run some quick tests with it on ImageNette and 20 epochs just to try it out and will update…really nice to have everything all packaged together like this.
Anyway, thanks for making this and great job!

I was able to spend an hour and a half working with it - it’s truly a ‘dream workbench’ for data augmentation :slight_smile:

I’m still learning how to best apply all these options…in my short testing, I’m finding that for the first epoch you want to stick with either no augmentation or whole image augmentations with very minor augmentation.

I tested out a whole variety of combinations thanks to this new workbench and if you throw too much at it up front, it just doesn’t learn well. This fits in witih the whole curriculum learning paper, that kindergartners should get kindergarten problems, college students get college problems.

Thus, the first epoch I was seeing .33% to .51% with no augmentation or very minor sprinkles. But with larger amounts of augmentation (ala quad ricap, or randomized cutouts) it ended up at .12%, .17%, .21%, etc.

I thus think there is likely an optimal curve for progressing the augmentation but trying to figure out what it is…
Anyway, thanks again @oguiza - this is really a great piece of software!


Thanks @LessW2020! It’s great to know you are using it!
Please, let me know if you find any issue or have any idea you’d like to integrate.

Btw, I’ve fixed this morning a small bug when using same_size = False. I’ve also upgraded the notebook to show some better examples.
So if you are using the code you may want to use the latest version (repo).


OMG this is soo cool!! thank you very much for doing this!

@oguiza nice work, thanks for sharing with all.

Regarding the noise, one thing to watch for is potential of messing up the image statistics, especially wrt batch-norm behaviour and potential for train and test diverging. I found noise to be better than black cutouts, but noise not calibrated to the dataset mean/std can cause issues. Using gaussian instead of uniform has worked well for me when the typical (x - mean) / std nomalization is used. I always apply the erasing/cutout after normalization when the data is float and centered at 0 to prevent repeating the mean/std twice and avoid issues with clipping of the noise that might again impact stats.

An example ImageNet batch: