Training a model from scratch: CIFAR 10

jamesrequa · November 17, 2017, 7:37am

I tried using the SENet weight file that was pre-trained on CIFAR10 but couldn’t get it to work…so of course I decided to train it from scratch Its really cool/awesome to run through Jeremy’s notebook just to see the progression from like 2 loss on 8x8 images up to state of the art results! If anyone has the time I would recommend doing it, I think it helps build up an “intuition”

Anyway, if anyone runs into similar issues with that weight file feel free to reach out, you can also try using mine

zpnc · November 17, 2017, 8:50am

I’m in the process of training the resnext 29 8 64 from scratch to get experience and intuition for cases (tasks) when no-one will provide the almost perfect weights

The process takes time and I have to ‘interrupt’ it to stop the instance and get some sleep. @jeremy do you get any sleep at all? The amount of work that you do is just amazing

But seriously: if you have to ‘interrupt’ the process, don’t forget next time you load your intermediate model to unfreeze it before you continue with training. In addition, be sure also to set the data set with correct image size.

zpnc · November 17, 2017, 11:51am

@jeremy towards the end of your training the resnext model in cifar10.ipynb (image size 32x32) the model looks like it is severely overfitting (training loss <0.10; validation loss ~ between 0.2 - 0.3), but you kept on fitting. Does this mean that we should ignore overfitting as long as the validation loss is reducing?

ramesh · November 17, 2017, 1:01pm

Here’s my CIFAR10 from scratch with a Custom PyTorch Model. Based on Examples from Jeremy, Yannet and my fellow learners

Appreciate any feedback. Goal was not to create the best model, but to show yet another example of how you could create a custom model and integrate with the Learner.

gist.github.com

https://gist.github.com/sampathweb/431a47a57cf71dd39a881ebf6abbaace

cifar10-fastai-custom-model-scratch.ipynb

{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Put these at the top of every notebook, to get automatic reloading and inline plotting\n",
    "%reload_ext autoreload\n",

This file has been truncated. show original

radek · November 17, 2017, 1:09pm

@gerardo I think that this is what you may have been looking for Great stuff @ramesh, thx for sharing this

ramesh · November 17, 2017, 1:19pm

We have feature requests to add Early Stopping and Model Checkpoints to prevent overfitting. You can find more here, under the Learner Fit Options - Wiki: Fastai Library Feature Requests)

I would like to know the thoughts on when to stop the model fitting process.

binga · November 17, 2017, 1:34pm

Maybe have an early_stopping_rounds (in XGBoost) / patience (in Keras) kind of a parameter that terminates the training process once the validation loss doesn’t decrease after so many rounds / epochs? This seems to work more often than not on many problems and many people are familiar with this API too. Do you suggest any better approaches?

EDIT: loss -> validation loss.

radek · November 17, 2017, 1:37pm

I think this is coming in the course when we discuss creating our own parts of models or entire models. I started typing out a very long response again, but the way @jeremy walks through the methodology of going about this (starting with overfitting and then working towards a really well trained model) is imho outstanding

I would not worry about overfitting as such just yet and in general would defer to @jeremy with regards to what the best course of action should be here.

BTW a gentleman who goes by the nick @smerity on Twitter wrote this super awesome post on model architecture and simplicity and I think if you combine the walkthrough of the methodology with the contents of this article, you will be in deep learning practitioner paradise

radek · November 17, 2017, 1:55pm

Which loss do you look at not decreasing? With overfitting you will have a U shaped loss on your validation set. And train set loss will stop decreasing only when you overfit.

But if you stop there at the bottom of the validation loss valley, this is also not great - you are completely skipping all the goodies that SGDR gives us in terms of finding a nice spot in the weight space.

But the main concern is - if you do early stopping, how do you know it is the best the model could get? It’s sort of like keeping your fingers crossed that somehow training this model of incompatible architecture, if you stop early, you will somehow arrive at the best answer you could on a different but related question that the model was not designed to answer

What @jeremy has for us is 1000x better imho and I never came across this methodology being shared anywhere outside of fastai I hear it is known in deep learning community but literally heard about it first from @jeremy and never heard about it anywhere else.

zpnc · November 17, 2017, 2:02pm

@ramesh This looks great! Thanks for sharing.

binga · November 17, 2017, 2:08pm

Well, I agree with you. This line makes a lot of sense!

zpnc · November 17, 2017, 2:11pm

Has anyone tried to to reach >0.9 accuracy by starting with 32x32 images from the beginning? I would like to understand what does it take (how many epochs, learning rate schedule, etc.) to get to the same accuracy as we do now (start training with 8x8, move to 16x16, …, and so on) with ~40-50 epochs in total. If I remember correctly the authors of cifar10 resnext model trained for 300 epochs. To me it looks that by gradually increasing the image size the model is learning faster.

radek · November 17, 2017, 2:15pm

We also use SGDR which is know for speeding things up quite a bit

Nonetheless such comparison would be cool I think this could be even pulled off using this notebook but not using the cycles, etc and comparing this also vs not resizing the images.

zpnc · November 17, 2017, 2:19pm

yes, you’re correct @radek, we’re using also SGDR. To rephrase my question: how many epochs does it take to reach the same precision using SGDR, but starting with 32x32?

I agree with you that it’s easy to test this with current notebook, but there are so many things to do and so little time

radek · November 17, 2017, 2:21pm

Absolutely, I can relate o that

I am not even sure if same results can be achieved without the resizing but would be really cool to see the comparison!

jakcycsl · November 17, 2017, 4:59pm

I modified the code in models/cifar10/senet.py so that we are able to pass in the number classes we want.

Previous: def init(self, block, num_blocks, num_classes=10)
Now: def init(self, block, num_blocks, num_classes)

Previous: def SENet18(): return SENet(PreActBlock, [2,2,2,2])
Now: def SENet18(num_classes=10): return SENet(PreActBlock, [2,2,2,2], num_classes)

It works fine for me after that for the iceberg challenge.

jeremy · November 17, 2017, 5:04pm

Absolutely yes! I was also surprised at how much overfitting this training seemed to handle.

jamesrequa · November 17, 2017, 5:51pm

Did you use the pre-trained version? I tried this same change but it only works when using the model from scratch.

jakcycsl · November 17, 2017, 6:04pm

It didn’t work for me too when I used the pretrained version. Seems that the weights were saved according to the num of classes (10 vs 2).

jamesrequa · November 17, 2017, 6:08pm

Yea interestingly when I used the pre-trained version, without changing the num_classes, while I do get 10 predictions per image but the first two of those predictions corresponding to 0 and 1 seem to be accurate? Even tho its a bit unorthodox I wonder if we could just grab those first 2 predictions and just use those on binary problems like iceberg.