Training a model from scratch: CIFAR 10

I’ve just pushed a couple of things that the iceberg competition folks in particular will find interesting…

Here’s a notebook showing how to train CIFAR 10 from scratch, which also serves as a foundation for any use of custom networks you may want to try:

The notebook uses the CIFAR10-tuned ResNext architecture from here: . I’ve also made the pretrained weights available here: . You can use learn.load() to load them, after creating the learner using the steps shown in the notebook. The data to train it yourself from scratch is here:

My guess is that this architecture should work well for the iceberg competition. The pretrained weights may be a little helpful, but I’m not sure either way…

(You could easily change the resnext definition to handle 2 channel input directly BTW. Would be a good exercise.)


Wow ! Thank you so much

Hey @kcturgutlu got something much better now! Try from fastai.models.cifar10.senet import SENet18 and grab the weights from . This is the 'squeeze and excitation network` that won Imagenet this year, trained on Cifar10 to 94.1% accuracy :slight_smile:

It’s also really really fast!!!


So exciting thanks :smiley:

ELI5 - Can you elaborate on how to load this model?

Follow the steps in the notebook in the top post. Let us know if you get stuck…

What was the intuition behind increasing image size during training ?

Section 4 of the Snapshot Ensembles paper indicates that the standard augmentation technique for the CIFAR datasets is to:

  • zero-pad with 4 pixels on each side

  • randomly crop to produce 32 x 32 images

  • horizontally mirror with probability 0.5

Cell 4 of your notebook does the flipping and padding. Is the cropping also accounted for?


Yes, well spotted! When you pad, it random crops by default :slight_smile:

Thank you.

Here’s an example of everything to use the pretrained model for 94% on CIFAR (except for get_data)


This is amazing, thank you!! Not to be greedy but are the ImageNet weights available as well? :grin:

Hi @jeremy I know that we can’t have floating padding and you also recommended me to use original image size as sz. In this case we have 75x75 images and pad = sz//8. Should I change padding or sz ? Which makes more sense and why ? Thank you so much again for the amazing update in the library :slight_smile:

Edit: It works with data = get_data(75, 32) but couldn’t understand how pad 75//8 is allowed.

1 Like

What argument does wds mean?
From the code, it states wds (iterable/float), but I don’t quite get what it does.

learn.load(‘sen_32_32_8’) would get this error. Running on AWS AMI instance.

cuda runtime error (10) : invalid device ordinal at torch/csrc/cuda/Module.cpp:87

wds - weight decay(s)

1 Like

Nope, but if you feel like training them, I’ll host the weights :slight_smile:


@jeremy this model uses 10 different categories.
If I’m trying do the dog breed or the satellite images or any other one.
The number of categories will change.

What is the best way to fix the number of categories here to fit to the new model.
I remember in Keras you pop the last layer and you add a new fully connected layer with the number of categories needed.
fastai and pyorch should have something similar
Can you shed some light :flashlight: here?

1 Like

@gerardo number of classes is an argument of the resnext29_… methods (by default it’s set to 10). See for example definition of resnext29_16_64(num_classes=10) in fastai/models/cifar10/

fastai will do it for you automagically :slight_smile: The whole popping of the last layer and slapping a layer on top that has the correct dimensionality (aligned with the amount of classes in your dataset)