A challenge for you all

I’ve put a challenge out on Twitter, and I’ll put it here too: what’s the best accuracy you can get on Fashion-MNIST in 5 epochs, 20 epochs, or in 50 epochs?

But you folks are in this course and that’s an unfair advantage, so you (and me) have a special restriction – you’re only allowed to use techniques that follow the rules of this course (i.e. they have to have been first reimplemented from scratch in pure Python, or using only things that have been reimplemented from scratch).

Reply below with your best score! And tell us how you went about it, if you like. :slight_smile: I’ll keep track of best scores below – I hope one of you manages to beat me!

Epochs Accuracy % Username
5 94.9 @christopherthomas
20 95.7 @christopherthomas
50 95.8 @christopherthomas

Hall of fame

These people have been on the above leaderboard at some point:


Even if you don’t have a result that’s better than the leaderboard, post here anyway what you’ve tried, how it went, and what you’re planning next!

I tried implementing my own dropout neural network module, then added a dropout layer after each ResBlock. With some experimentation, a dropout probability of 0.1 seemed to result in the best improvement.

I’ve managed to get to 93.0% accuracy after 5 epochs of training based on the original model. Then 94.3% accuracy using the same approach with model 3 after 20 epochs of training (without test time augmentation). No improvement on 50 epochs though (94.5%).


Nice :slight_smile: Can you share your notebook?

Here’s my notebook

I had replaced my Dropout module with nn.Dropout as that runs faster.

I did try test time augmentation with the updated model 3 trained for 20 epochs, although the accuracy was the same.


Here is my Mixup from scratch approach.

I ran it once for 50 Epochs and got up to 94.1 that’s when I realized that I’m not competing with the stats from the repo but with those from this thread :sweat_smile:

Then I created this somewhat cleaned up version to share some intermediate results. Hope that I can get a little closer within the next days. Might need to implement Dropout from scratch, too :upside_down_face:


After fixing the batchnorm problem noticed by @piotr.czapla we’re now all even for 5 epochs, and I’m a bit ahead for 20 epochs. We better try combining dropout and batchnorm!

1 Like

A good idea from @christopherthomas was to try using Dropout, which tied the best batchnorm result for 5 epochs.

Then I discovered that replacing Dropout with Dropout2d is even better!

Congrats @JackByte for passing 94% and getting mixup working! :smiley:


I’ve managed to get an accuracy of 95.2% for 50 epochs. I got 95.0% with your batch norm changes, dropout and test time augmentation. Then the further improvement was from only having the 2nd dropout layer (The second run with get_model6 in my notebook).


I created a custom data sampler which gets an accuracy of 93.3% for 5 epochs. [not tested on the 20 or 50 epoch challenge yet]

The sampler looks at the loss from the previous epoch and drops x% of training images with the lowest loss. It replaces them with x% of the training images with the highest loss. This gives the model two opportunities to train on the most challenging images.

So far I’ve found that replacing 10% of the dataset before the second epoch works best but I’ve only really scratched the surface. There are much more sophisticated approaches like only dropping images below a certain loss or images with a low stable loss for consecutive epochs, etc…

The code is a bit hacky at the moment but here’s the notebook I’m using. The model is a replica of the one used in the augment notebook except it uses the custom data sampler described above.

EDIT: On the last couple of runs I’m now getting 93.4% accuracy on the 5 epoch challenge. The only change I made is calling set_seed(42) just before instantiating the learner variables. I wonder if the sampler is introducing some flake :thinking:


I’ve found the same appears to be the case for 5 epochs and 20 epochs giving a better accuracy with only having the last dropout layer. It also seems the dropout layer seems to have a marginally better accuracy before the flatten layer.

For 5 epochs I’ve also got 93.3% - my notebook. For 20 epochs 94.7% accuracy - my notebook. Test time augmentation didn’t have an improvement here when I tried.


Great approach. This general idea is called “curriculum learning” FYI.


I tried downloading @tommyc 's notebook, and ran it on Jarvis Labs instead of Google Colab. Strange thing is that with seed 42, I was getting a value less than 93.4%, so I ended up playing with the seed until I got something close to what Tommy shared. Wondering if there’s some weird platform differences that lead to a different result despite the same seed.

Anyways, I noticed that Tommy didn’t use Dropout in his notebook, so I added Dropout to his model and got an accuracy of 93.5% on 5 epochs with this notebook. This ran on Jarvis Labs, so I’m curious if this could be reproduced with Google Colab(Colab seems pretty slow for me, probably since I’m on a free plan).

I also ended up noticing that the Dropout code seem to have a mistake in the course22p2 repo and the subsequent notebooks in this thread. The original line was:

        dist = distributions.binomial.Binomial(tensor(1.0).to(x.device), probs=1-p)

But I believe it should be:

        dist = distributions.binomial.Binomial(tensor(1.0).to(x.device), probs=1-self.p)

Thanks for pointing that out - we noticed that a couple of days ago so it’s fixed in the repo IIRC.

1 Like

I was able to achieve 94% in 5 epochs by reducing batch size to 256, and adding Mish activation to @tommyc 's curriculum learning attempt and adding Dropout. The colab notebook is here.

I do see the accuracy varying with different runs though so there seems to be some randomness despite setting a seed. I see the accuracy varying between 93.7 to 94.0.

Although this challenge is for Fashion MNIST, I swapped the dataset to MNIST and was surprised to see it achieving 99.6% accuracy on test set (99.9% in training) within 5 epochs!

The notebooks for both are in the repo here.


Wow that’s a big jump! Nice one :smiley: @Diganta will be pleased to see the success of Mish.

1 Like

Are these results against the test set?

Of course! :smiley:

1 Like

wow, some amazing results!
I started playing with other archs today, and the convmixed/mlp-mixer from timm (custom smaller variants) train very well. Encourage you to try.
Got a 2M param ConvMixer to 9.38% in 5 epochs!

PD: A cool trick is ramping the one cycle faster, with pct_start=0.1 for these models.


Managed to get to 94.2 in 5 epochs and 94.7 in 20. Also an incredibly low validation loss of .05

Still using ConvMixer (with some tricks)

1 Like