Training a model from scratch: CIFAR 10

@ecdrid To train from scratch then just follow all of the same steps except remove this line.

learn.load('sen_32x32_8')

This line was there to load in the pre-trained model that had already trained on CIFAR10. Since you are training from scratch, then you don’t need it.

When you are done training and want to save the model just do learn.save('senet_model')

2 Likes

Just wanted to know that why does the training loss increase abruptly when i increase
the epochs even
though it was performing much better just before the new learn.fit() line?
Given that I don’t overfit at all

List of benchmarks for CIFAR-10 (after MNIST).

http://rodrigob.github.io/are_we_there_yet/build/classification_datasets_results.html

1 Like

@jeremy
Can you provide the architecture of the SE net you have used for training on Cifar-10?

Checkout the forums…
They have the information…

@jeremy @ramesh: I can see that jereme’s fix is on my box. However I still get the same error, which goes away on applying @jakcycsl hack.

Can you please tell me if I need to make changes to my ipynb (perhaps define ‘storage’)?

Or is jeremy’s fix applicable only for models saved after the fix?

What’s the intuition behind start training the model at smaller sizes?

  1. Starts faster during early stage?
  2. Less prone to overfit as mentioned in the course

Hope someone can shed some light or point me to somewhere that I can read. Thank you

The way I think about it: smaller images allows the model to first learn the overall structure, the big picture. As we increase the size, it starts filling the details within the structure.

This might be easier to do then trying to infer the big picture from a detailed view. And getting the big picture right is what I suppose helps with generalization.

But this is just my intuition. Makes sense to me that we might want our model first to learn shapes and only then the texture. But would take this reasoning with a grain of salt :slight_smile:

The main consideration here is whether this layered approach produces better results than training with full image size from start.

1 Like

Yeah that sounds around right. But we don’t have rigorous experiments yet to know whether this gives better accuracy in the end, or speeds up time to get to some accuracy level, or whether it doesn’t really help at all - it’s something that some of the part 2 students are planning experiments for at the moment.

That is really great to hear! :slight_smile: Looking forward to the outcomes.

Thanks for the explanation. I assume this would only make sense if the image is being resize/rescale when setting to different “sz” values. If it’s just doing random cropping by “sz*sz”, it would only be learning part of the image instead of learning the big picture. I’m going to start digging in the fastai transforms source code later.

P.S. Is there a specific reason setting the padding to “sz//8” ?

Yes! :slight_smile: This is the augmentation (plus random horizontal flipping) that was used in many papers that train on CIFAR10. For instance, the Resnet paper uses this data augmentation.

I played around with this some time ago and this is what the augmented images look like:

2 Likes

Any insights about the massive differences in learning rates here, on the first learn.fit of the cifar10 notebook?

Whatever Jeremy is running is taking 40.8 seconds to run, with a batch size of 128. My old GTX 650 Ti with only 2GB RAM, and with batch size reduced to 32 to cope, takes almost 15 minutes!

Of course I intend to upgrade when I can, but what would I expect to get if I chose a 1080 Ti - would it be anywhere near what Jeremy shows I wonder?

Mine:
image

Jeremy’s output - and it shows CPU times…
image

Should be identical - that’s what I used.

2 Likes

Thanks Jeremy - that’s encouraging to know! My current system can take a 1070 Ti without any further hardware upgrades - how much slower do you think that would be?

I tried to implement the entire process from scratch with pytorch. Using SNet, the best accuracy I could get is only around 93.2%. Is there any tricks in the fastai library that could push the result even more?

My training process has:

  • Data augmentation (Random flip, Random Crop and padding)
  • Weight decay, Momentum
  • SGDR - implemented using lambdaLR class from pytorch, bad idea but it worked
  • Snapshot Ensemble - helped a little bit with the results that I submitted to kaggle
  • Learner class - poor man’s version just to make it easier to change the data size

What else should I be doing? Here’s my notebook. I would really appreciate some input.

Another note, it seems that whether I use differential data size training(is there a term for this technique?) or not, the end results are similar to ones that I trained with 32x32 right from the start(around 93.2%). However, it seems to better at maintaining a better balance between train loss and val loss at early stages. Is there anyway to take advantage of this?

1 Like

I found the senet154 has been added to fastai library, but i don’t konw how to use it.
the original method is not available:
arch = senet154
learn = ConvLearner.pretrained(arch, data)
what should i do?
thanks!

Hi ! Really nice work, I am a bit late and I don’t know if the comment will be accurate since the Library evolved a lot since then but now the default loss function for training given by ImageDataBunch is
FlattenedLoss of CrossEntropyLoss() which is a function of Pytorch CrossEntropyLoss that, according to the doc

This criterion combines nn.LogSoftmax() and nn.NLLLoss() in one single class.

so I don’t think the LogSoftmax() activation at the end of your model is necessary because will have 2 activations function, again I don’t know if this is relevant to the previous versions of FastAi but it could maybe help other people with this issue today

may i ask how to load senet model in fastai v1 ? because i followed your code, it says:
ModuleNotFoundError: No module named ‘fastai.models’
:worried::worried::worried: