Achieving top 10% in MNIST competition on Kaggle

Hey everyone,
I am just getting started with FastAI and am trying to implement what we learned in lesson 1 on Digit Recognizer competition on Kaggle. I am currently in top 22%. I am using Resnet18 with 30 epochs and learning rate of 1e-6.
How can I improve it?
Below is the link to my Notebook. Any suggestions would be really appreciated.

https://www.kaggle.com/sanwal092/fastai-lesson-1-mnist-implement

2 Likes

Working on the same stuff here.
Consider Resnet50 for a change.
I see you ran considerably more epochs with good effect.

Yes. I tried Resnet 18 with 30 epochs. It took almost 1.5 hours on Keras GPU. I am going to try Resnet 34 with around 15 epochs and see what happens

@sinsji
Update 1: Tried Resnet34, 15 epochs with lr = 4e-7. The training took about 30 minutes on Kaggle GPU. The score increased by 2 positions, no percentage change. Will try 20 epochs next

UPDATE 2: Reached 99.10% with ResNet34, 20 epochs, lr = 4e-6. I have reached 99.428% using TensorFlow implementing custom Neural Net architecture. I am trying to beat that.
Moving to ResNet50 now.

UPDATE 3: Tried Resnet50 with 15 epochs, LR = 4e-6 The score fell down to 97.5%. This architecture takes longer to train and might be an overkill.

UPDATE 4: MAJOR UPDATE. I cracked top 13%. Used Resnet34 with 30 epochs, batch size of 256, learning range of slice(1e-3,1e-2). I normalized the data not with mnist_stats but with imagenet_stats

Okay interesting.
I’m thinking about useful transformations myself.
With the large number of training images, I doubt whether transformations are very useful.
Good luck.

I will be adding my results in the above comment as I iterate through it if you would like to check back!!

I cracked top 13%

Congrats on the improvement.

Why the 256 batch size? Any reasoning behind it why it could make a difference?
Do you use a continues run of 20-30 epochs or do you run fit_one_cycle multiple times?
I tend to do run it a few times, based on the course examples although I don’t know if it makes a difference. Further, many Kaggle examples just run it once for many epochs.

I saw that batch sizes in another kernel which reminded me that in the fit_one_cycle paper, the author found that higher batch sizes led to better results. I don’t recall the reasoning but it is a complicated and lengthy paper which I will have to read a few more time.

I run fit_one_cycle twice. Once to set up the learning rate_finder. Then I ran 20-30 epochs by specifying that parameter in the fit_one_cycle function

I think you meant to say 97.5% with Resnet50?

I’ve been trying with resnet 50 and I’m able to get as high as 99.1% by unfreezing and fit_one_cycle(1).

Thanks for sharing your kernel, I’m currently trying using imagenet normalization to see if I get any improvements using it with resnet 50.

Thanks for catching the error in my percentage.

I tried ResNet50 but it just took too long to train. Let us know how it works out for you.

I’m running into an error

TypeError: argument of type ‘PosixPath’ is not iterable

and I noticed that I’ve found this error in your kaggle kernel and commented on it before. Did you do something to resolve it, or did it go away on it’s own?

It’s really strange but it keeps coming up now and then for me and I can’t figure out why!

James,
I think that the pathlib library in Fastai conflicts with that of Kaggle. It is probably different Python versions. You might be getting that error when you are building your ImageDataBunch object. If so, replace your code to match the following as needed:

data = ImageDataBunch.from_folder(
path = ("…/train"),
test = ("…/test"),
valid_pct = 0.1,
bs = 256,
size = 28,
num_workers = 0,
ds_tfms = tfms)

Instead of using the paths where you are using Fastai to build path addresses. Hopefully this helps!

1 Like

Hi Sanwal,

Looks like you’re doing everything right - but I’m new to fastai too.

If you’re interested, I’ve written a web app to make it easy to create your own MNIST style images (and make predictions on them) https://github.com/pete88b/data-science/tree/master/myohddac
I’m guessing you can’t train on additional data for the competition but it’s been good for me to see what the model predicts when I feed it my hand writing.