Training a model from scratch: CIFAR 10

I tried using the SENet weight file that was pre-trained on CIFAR10 but couldn’t get it to work…so of course I decided to train it from scratch :slight_smile: Its really cool/awesome to run through Jeremy’s notebook just to see the progression from like 2 loss on 8x8 images up to state of the art results! If anyone has the time I would recommend doing it, I think it helps build up an “intuition”

Anyway, if anyone runs into similar issues with that weight file feel free to reach out, you can also try using mine


I’m in the process of training the resnext 29 8 64 from scratch to get experience and intuition for cases (tasks) when no-one will provide the almost perfect weights :wink:

The process takes time and I have to ‘interrupt’ it to stop the instance and get some sleep. @jeremy do you get any sleep at all? The amount of work that you do is just amazing :smiley:

But seriously: if you have to ‘interrupt’ the process, don’t forget next time you load your intermediate model to unfreeze it before you continue with training. In addition, be sure also to set the data set with correct image size.


@jeremy towards the end of your training the resnext model in cifar10.ipynb (image size 32x32) the model looks like it is severely overfitting (training loss <0.10; validation loss ~ between 0.2 - 0.3), but you kept on fitting. Does this mean that we should ignore overfitting as long as the validation loss is reducing?

Here’s my CIFAR10 from scratch with a Custom PyTorch Model. Based on Examples from Jeremy, Yannet and my fellow learners :slight_smile:

Appreciate any feedback. Goal was not to create the best model, but to show yet another example of how you could create a custom model and integrate with the Learner.


@gerardo I think that this is what you may have been looking for :slight_smile: Great stuff @ramesh, thx for sharing this :slight_smile:

We have feature requests to add Early Stopping and Model Checkpoints to prevent overfitting. You can find more here, under the Learner Fit Options - Wiki: Fastai Library Feature Requests)

I would like to know the thoughts on when to stop the model fitting process.

Maybe have an early_stopping_rounds (in XGBoost) / patience (in Keras) kind of a parameter that terminates the training process once the validation loss doesn’t decrease after so many rounds / epochs? This seems to work more often than not on many problems and many people are familiar with this API too. Do you suggest any better approaches?

EDIT: loss -> validation loss.

1 Like

I think this is coming in the course when we discuss creating our own parts of models or entire models. I started typing out a very long response again, but the way @jeremy walks through the methodology of going about this (starting with overfitting and then working towards a really well trained model) is imho outstanding :slight_smile:

I would not worry about overfitting as such just yet and in general would defer to @jeremy with regards to what the best course of action should be here.

BTW a gentleman who goes by the nick @smerity on Twitter wrote this super awesome post on model architecture and simplicity and I think if you combine the walkthrough of the methodology with the contents of this article, you will be in deep learning practitioner paradise :smiley:


Which loss do you look at not decreasing? With overfitting you will have a U shaped loss on your validation set. And train set loss will stop decreasing only when you overfit.

But if you stop there at the bottom of the validation loss valley, this is also not great - you are completely skipping all the goodies that SGDR gives us in terms of finding a nice spot in the weight space.

But the main concern is - if you do early stopping, how do you know it is the best the model could get? It’s sort of like keeping your fingers crossed that somehow training this model of incompatible architecture, if you stop early, you will somehow arrive at the best answer you could on a different but related question that the model was not designed to answer :slight_smile:

What @jeremy has for us is 1000x better imho and I never came across this methodology being shared anywhere outside of fastai :slight_smile: I hear it is known in deep learning community but literally heard about it first from @jeremy and never heard about it anywhere else.

@ramesh This looks great! Thanks for sharing.

Well, I agree with you. This line makes a lot of sense!

Has anyone tried to to reach >0.9 accuracy by starting with 32x32 images from the beginning? I would like to understand what does it take (how many epochs, learning rate schedule, etc.) to get to the same accuracy as we do now (start training with 8x8, move to 16x16, …, and so on) with ~40-50 epochs in total. If I remember correctly the authors of cifar10 resnext model trained for 300 epochs. To me it looks that by gradually increasing the image size the model is learning faster.

We also use SGDR which is know for speeding things up quite a bit :slight_smile:

Nonetheless such comparison would be cool :slight_smile: I think this could be even pulled off using this notebook but not using the cycles, etc and comparing this also vs not resizing the images.

yes, you’re correct @radek, we’re using also SGDR. To rephrase my question: how many epochs does it take to reach the same precision using SGDR, but starting with 32x32?

I agree with you that it’s easy to test this with current notebook, but there are so many things to do and so little time :slight_smile:

1 Like

Absolutely, I can relate o that :slight_smile:

I am not even sure if same results can be achieved without the resizing but would be really cool to see the comparison!

I modified the code in models/cifar10/ so that we are able to pass in the number classes we want.

Previous: def init(self, block, num_blocks, num_classes=10)
Now: def init(self, block, num_blocks, num_classes)

Previous: def SENet18(): return SENet(PreActBlock, [2,2,2,2])
Now: def SENet18(num_classes=10): return SENet(PreActBlock, [2,2,2,2], num_classes)

It works fine for me after that for the iceberg challenge.


Absolutely yes! I was also surprised at how much overfitting this training seemed to handle.

Did you use the pre-trained version? I tried this same change but it only works when using the model from scratch.

It didn’t work for me too when I used the pretrained version. Seems that the weights were saved according to the num of classes (10 vs 2).

Yea interestingly when I used the pre-trained version, without changing the num_classes, while I do get 10 predictions per image but the first two of those predictions corresponding to 0 and 1 seem to be accurate? Even tho its a bit unorthodox I wonder if we could just grab those first 2 predictions and just use those on binary problems like iceberg.