Progressive Sprinkles (cutout variation) - my new data augmentation & 98% on NIH Malaria dataset

I spent the weekend working on trying to beat the highest accuracy I could find (97% & 96%) on the NIH malaria dataset…
and along the way came up with a ‘new’ (to my knowledge) data augmentation that helped me get to 98% on the malaria dataset.

My intuition behind this ‘progressive sprinkles’ augmentation was b/c cutmix simply wouldn’t work for the malaria dataset. If you clipped the clean half of an infected cell, and showed that, there is nothing the CNN could learn from to tell it that it was infected. Therefore it was pointless to use that or RICAP as they would mislead it.

I tried with mixup and got reasonable results, but not SOTA, and also tried with the usual transforms (flips, etc). The argument against mixup is that it can create artifacts in some places where things overlap that are not present in either photo…however, both mixup and cutmix outperform the older standard cutout.

Cutout, in its original form of a single big block blacked out, could seriously also ruin an image by cutting out the infected part of a cell and give the same result (telling the CNN this visibly clean cell was infected).
Take a look at cutout in it’s normal form…a big black box (middle, cutmi and mixup also shown on each siide as comparison):

However, if you take a random grid/series of small squares, and randomly sprinkle those on the image, and slowly increase their probability and sizes, then you can force the CNN to look more completely at the entire image for classification clues, while at the same time, in most cases, avoid blocking out enough data that would block it from truly learning.
Like this:

(And if you don’t know, cutout is already a transform in fastai! With tunable sizes/probability,etc.)

Here’s how it looks of course without sprinkles:

I then combined the concept of curriculum learning from the dropout paper where they showed starting with no dropout and increasing it on a slow gradient produced more robust CNN’s then just keeping dropout fixed or ‘stepping’ it in large increases.

Thus, I started with no sprinkles and then slowly increased probability, frequency, and size as training progressed of the sprinkles.

The result was the best I was able to get after two days of working on it - 98% (repeatedly), and higher than the original malaria paper with a custom built CNN (96%), and several other articles (96 in custom Keras and 97%…with fastai!).

I did the progression by hand so I’m going to next write up a callback that will help automate it across training runs and post that out if anyone is interested. I’d like to test this new sprinkle augmentation on ImageNette as well to see how it compares vs cutmix, ricap, etc.

My other idea is possibly changing from black cutouts to partially see-through (i.e. reduced opacity) and see if that helps as it would allow for another aspect of learning (less clear, but structure would still show to some degree).

Anyway, I was happy to be able to take an idea and run it through it’s paces and have it perform nicely at least on this data set. And thanks to @sgugger and @xnutsive for putting cutout in as an augmentation!


@LessW2020 Nice result. Curious if you’ve tried ‘frosted sprinkles’? :slight_smile: As in, using noise instead of a constant black or dataset mean value.

I’ve used random erasing (same thing as cutout) quite a bit, but have found that the per-pixel noise has always worked better for me than constant. The original paper for RE used uniform noise, but I use gaussian, applied after image normalization.

1 Like

Hi Ross,
Thanks for the kind words re: result.
I’m definitely planning on extending the style of the sprinkles, and appreciate your letting me know about the ‘frosted’ aspect (plus the name is great!).
Of interest, I just tested with Ricap / Mixup / Cutmix / Progress Sprinkles and much to my surprise, progressive sprinkles came out the best in a 20 epoch contest. I had felt sprinkles was an addition to one of the big three, but in combining they all ended up worse as combos vs standalone.

Anyway, thanks a ton for the frosted idea - let me code it up and test it out and will post an update!


I’m super excited to report that after thinking about it a bit more, I tried refining the implementation of how the sprinkles were applied and for at least the first 20 epoch run, set a new high vs the current Imagenette leaderboard!
(to be fair, I will need to repeat this multiple times but still, I’m super happy to have even beat out Jeremy for one 20 epoch run lol).

It was bugging me today as to why if the sprinkles performed so well, how come I wasn’t beating the leaderboard…so upon further thought, made these changes;
1 - Implemented the sprinkles in a more refined increments. Specifically, instead of changing the % impl at every 5 epochs, I decided to update every two epochs.
Thus, probability of any image having sprinkles went from:
0% (first 2 epochs)
.2 / 20% (2-4)
.3 / 30% (4-6)

2 - I also did the same thing with batch size, increasing it by 10 every 2 epochs (another form of regularization basically). I started with 30 images per batch for first 2 epochs, then 40 for 2-4, 50 with 4-6, etc.

Which ended with this:

and an example of how the sprinkles looked during the training:

I’ll have to do some more repeats of this before I can claim I’ve truly beat it, but regardless, happy to see the sprinkles continuing to provide excellent results.


What are frosted sprinkles?

Do you instead mean 0 -> 20 % -> 30% for the probability of adding sprinkles? I am also curious what is the size of the ‘sprinkles’ that you are using. I am also curious at what rate you increase the frequency and size of the holes.

Could you provide error bars? At accuracies at high as those I think error bars are interesting to look at.

Nice work! I have a few comments about getting imagenette results based on my experience.

For Imagenette, note that there is quite a bit of variance from run to run, so you will need to run it a bunch of times.
Also I would rerun the baseline (multiple times). I reran baselines on imagewoof and got different (better) results than Jeremy. Reason being (I think) that I was running with only 1 GPU, so the learning rates are not equivalent. (???)
Finally, make sure your results are statistically significant. You could use this test:

You’ll get to statistical significance faster if you look at valid-loss (less variance), which is useful; but I guess what we all care about is accuracy.

Finally, check whether you are beating the baseline for equivalent run times. Basing results on epochs only can be misleading.


Amazing Work, can you share the notebook please?
I tested Mixup and Cutmix on ImageNet for 200 epochs.
Mixup alpha=0.2 top-1: 78.486
Cutmix alpha=1 top-1: 78.982

Hi Hussam,
I’ll make a notebook once I implement it as a callback…right now I’m hand adjusting the probabilities by hand every 2 epochs.
Re: ImageNet and 200 epoch - do you mean ImageNette or the full ImageNet proper?

Thanks for the feedback Seb! I’m definitely going to rerun multiple times but I want to implement the full thing as a callback first since right now I’m manually adjusting it every 2 epochs…the sprinkles are more effective than I expected, so I’ll put more time into automating it via callback.

1 Like

Hi James,
Yes, my mistake - I updated my post but I meant 0%,10%, 20% etc.
I’m currently keeping the sprinkles in range for size and number and just adjusting the % probability of it having sprinkles or not.
I’m going to turn it into a callback so it’s automated in terms of the p progression, and likely have it compute out the total % coverage created by the sprinkles as another parameter.

Hi Kushaj,
He means instead of having black boxes like in my images, you create random noise (like on a tv set with no image coming in) for the cutouts rather than black, and thus they look a bit frosted :slight_smile:

Yes, Trained ImageNet dataset on XResNet

1 Like

Got it.

So how do you decide on the noise? One way I think is to get a random value for all the RGB pixels in the range (0, 255). Or is there some other way also.


I’ve just loaded a 2 notebook to the fastai_extension folder that I created a few days ago. It contains a new callback to create multiple types of transformations. I call it Blend.
I’ve also managed to create a Scheduler to be able to dynamically change one or multiple parameters of any transform.
I’ve loaded it into a new thread Data Augmentation: dynamic blend.
I’ve love to get your feedback.


I saw the notebook. Very informative.

1 Like

Thanks for sharing this. Please share the notebook once you finish it. By the way SOTA based on the original paper from NIH was 98.6. Initially I got confused after reading the original article. Then I contacted the authors.

1 Like

Super article Less, I am enjoying reading your others too so keep it up! I made a quick function based on this to create a couple of different styles of sprinkle and controllable size and percentage of occlusion: github here. Might be useful for others to take and improve as I’m sure it won’t be as efficient as it could be but ti was fun to make. I know people here are using PyTorch but I am just starting the course and have previously been using Keras so I had hoped to fit it into the augmentation flow but seems a little tricky.


Is there a callback implementation of this available?