Training a model from scratch: CIFAR 10

What path are we suppose to store all the pre-trained model weights?

Fastai’s pre-trained model weights are at fastai/weights but some of the models are from PyTorch’s pre-trained like VGG, Resnet etc, they are stored at ~/.torch/models/

1 Like

For SENet its a bit different from all the other pre-trained weights which are automatically loaded from fastai/weights directory, since for this model we are actually using learn.load to load the weights in so you’ll need to place that particular weight file in your data directory in the models folder.

You’ll actually want it to be in the same place as your other model files for that dataset, which is f'(PATH}models'.

1 Like

Yes exactly, thats what I was trying to say :slight_smile: I just wanted to point out that this is the not the same location for the weights from other pre-trained ImageNet models like inception4, resnext, etc. since those are all loaded in from fastai/weights. (of course after you train yourself those new finetuned weights end up in f'(PATH}models' )

1 Like

@radek No magic here :frowning:

from fastai.models.cifar10.senet import SENet18

m = SENet18()
bm = BasicModel(m.cuda(), name=‘cifar10_SENet18’)

data = ImageClassifierData.from_paths(PATH, tfms=tfms_from_model(m, sz))
#data = ImageClassifierData.from_paths(PATH)

learn = ConvLearner(data, bm)
learn.unfreeze()
learn.load(‘sen_32x32_8’)

learn.fit(0.01, 5)

@zpnc No num_classes enable on the SENet18() :frowning:

This is predicting 10 classes

Out[24]:
array([[ -0.03931, -3.25638, -11.64748, -13.02063, -13.74167, -14.32404, -13.12414, -16.31706, -15.34966,
-13.79075],
[ -0.00171, -6.37512, -17.67231, -21.87936, -24.85791, -24.22245, -22.4937 , -26.0106 , -24.5726 ,
-23.42298],
[ -0.00004, -10.18968, -25.22412, -33.11356, -37.17604, -35.85092, -32.23149, -38.18449, -36.37085,
-34.89988],
[ -0.00043, -7.74531, -16.0502 , -23.37728, -22.80679, -24.57661, -22.05182, -25.2618 , -23.55353,
-23.559 ],
[ -0.18095, -1.79912, -9.98963, -12.32735, -12.66292, -13.37598, -10.96291, -14.3664 , -13.65081,
-11.73629],
[ -0.00154, -6.4734 , -17.16404, -25.36037, -26.31315, -26.37898, -23.26433, -26.81647, -23.59966,
-24.0616 ],
[ -0.00007, -9.62542, -25.56021, -32.92747, -35.47162, -34.49596, -31.37358, -36.26967, -33.80552,
-32.72007],
[ -0.00002, -10.69174, -23.49642, -33.88793, -34.70346, -35.1624 , -31.05718, -36.33319, -32.74025,
-33.47608],
[ -0.00613, -5.09705, -13.74302, -17.84868, -18.88378, -19.37073, -16.35867, -20.6087 , -19.64924,
-18.3053 ],
[ -0.08595, -2.49756, -10.14199, -12.39996, -12.31104, -13.47758, -12.54866, -13.82268, -12.55679,
-11.21001]], dtype=float32)

Hey @gerardo - I am sorry, you were right and I was wrong. Apparently, there are many ways to skin a cat using the fastai library.

The difference is in how our model gets constructed here - apparently there is such a class as BasicModel and by instantiating it, we are doing part of the job that calling ConvLearner.pretrained does for us when doing transfer learning…

The way to solve this (I think - and I can’t look at the code right now so speaking from memory) is looking at ConvBuilder in learner.py and redoing the relevant steps on our model from its init function. What we are doing here is way cooler than in part 1 v1 as there is an adaptive pooling layer (@jeremy’s invention? a very neat thing indeed! :slight_smile: ) that we slap onto the convolutional part of the model - it is a concatenation of doing avg pool and max pool on the earlier layer, so we get the best of both worlds!

I cannot look at this right now and have one other notebook I need to finish one thing on, but the moment I jump back to using the fastai library, the deep learning part - which should be either tonight hopefully or tomorrow at the latest - I will figure this out and post here unless someone else beats me to it :slight_smile: Or maybe the steps in learner.py that I reference would suffice to reproduce this?

Anyhow - if this gets cleared up until I get a chance to figure this out myself and post here, please let me know :slight_smile: Otherwise, I will correct the incorrect answer I gave you last time by posting the steps how to change the model as soon as feasibly can.

I tried using the SENet weight file that was pre-trained on CIFAR10 but couldn’t get it to work…so of course I decided to train it from scratch :slight_smile: Its really cool/awesome to run through Jeremy’s notebook just to see the progression from like 2 loss on 8x8 images up to state of the art results! If anyone has the time I would recommend doing it, I think it helps build up an “intuition”

Anyway, if anyone runs into similar issues with that weight file feel free to reach out, you can also try using mine

6 Likes

I’m in the process of training the resnext 29 8 64 from scratch to get experience and intuition for cases (tasks) when no-one will provide the almost perfect weights :wink:

The process takes time and I have to ‘interrupt’ it to stop the instance and get some sleep. @jeremy do you get any sleep at all? The amount of work that you do is just amazing :smiley:

But seriously: if you have to ‘interrupt’ the process, don’t forget next time you load your intermediate model to unfreeze it before you continue with training. In addition, be sure also to set the data set with correct image size.

4 Likes

@jeremy towards the end of your training the resnext model in cifar10.ipynb (image size 32x32) the model looks like it is severely overfitting (training loss <0.10; validation loss ~ between 0.2 - 0.3), but you kept on fitting. Does this mean that we should ignore overfitting as long as the validation loss is reducing?

Here’s my CIFAR10 from scratch with a Custom PyTorch Model. Based on Examples from Jeremy, Yannet and my fellow learners :slight_smile:

Appreciate any feedback. Goal was not to create the best model, but to show yet another example of how you could create a custom model and integrate with the Learner.

9 Likes

@gerardo I think that this is what you may have been looking for :slight_smile: Great stuff @ramesh, thx for sharing this :slight_smile:

We have feature requests to add Early Stopping and Model Checkpoints to prevent overfitting. You can find more here, under the Learner Fit Options - Wiki: Fastai Library Feature Requests)

I would like to know the thoughts on when to stop the model fitting process.

Maybe have an early_stopping_rounds (in XGBoost) / patience (in Keras) kind of a parameter that terminates the training process once the validation loss doesn’t decrease after so many rounds / epochs? This seems to work more often than not on many problems and many people are familiar with this API too. Do you suggest any better approaches?

EDIT: loss -> validation loss.

1 Like

I think this is coming in the course when we discuss creating our own parts of models or entire models. I started typing out a very long response again, but the way @jeremy walks through the methodology of going about this (starting with overfitting and then working towards a really well trained model) is imho outstanding :slight_smile:

I would not worry about overfitting as such just yet and in general would defer to @jeremy with regards to what the best course of action should be here.

BTW a gentleman who goes by the nick @smerity on Twitter wrote this super awesome post on model architecture and simplicity and I think if you combine the walkthrough of the methodology with the contents of this article, you will be in deep learning practitioner paradise :smiley:

3 Likes

Which loss do you look at not decreasing? With overfitting you will have a U shaped loss on your validation set. And train set loss will stop decreasing only when you overfit.

But if you stop there at the bottom of the validation loss valley, this is also not great - you are completely skipping all the goodies that SGDR gives us in terms of finding a nice spot in the weight space.

But the main concern is - if you do early stopping, how do you know it is the best the model could get? It’s sort of like keeping your fingers crossed that somehow training this model of incompatible architecture, if you stop early, you will somehow arrive at the best answer you could on a different but related question that the model was not designed to answer :slight_smile:

What @jeremy has for us is 1000x better imho and I never came across this methodology being shared anywhere outside of fastai :slight_smile: I hear it is known in deep learning community but literally heard about it first from @jeremy and never heard about it anywhere else.

@ramesh This looks great! Thanks for sharing.

Well, I agree with you. This line makes a lot of sense!

Has anyone tried to to reach >0.9 accuracy by starting with 32x32 images from the beginning? I would like to understand what does it take (how many epochs, learning rate schedule, etc.) to get to the same accuracy as we do now (start training with 8x8, move to 16x16, …, and so on) with ~40-50 epochs in total. If I remember correctly the authors of cifar10 resnext model trained for 300 epochs. To me it looks that by gradually increasing the image size the model is learning faster.

We also use SGDR which is know for speeding things up quite a bit :slight_smile:

Nonetheless such comparison would be cool :slight_smile: I think this could be even pulled off using this notebook but not using the cycles, etc and comparing this also vs not resizing the images.