Training a model from scratch: CIFAR 10

jamesrequa · November 24, 2017, 10:35am

@ecdrid To train from scratch then just follow all of the same steps except remove this line.

learn.load('sen_32x32_8')

This line was there to load in the pre-trained model that had already trained on CIFAR10. Since you are training from scratch, then you don’t need it.

When you are done training and want to save the model just do learn.save('senet_model')

ecdrid · November 24, 2017, 11:51am

Just wanted to know that why does the training loss increase abruptly when i increase
the epochs even
though it was performing much better just before the new learn.fit() line?
Given that I don’t overfit at all

Moody · December 26, 2017, 2:04pm

List of benchmarks for CIFAR-10 (after MNIST).

http://rodrigob.github.io/are_we_there_yet/build/classification_datasets_results.html

devm2024 · December 27, 2017, 1:47pm

@jeremy
Can you provide the architecture of the SE net you have used for training on Cifar-10?

ecdrid · December 27, 2017, 3:35pm

Checkout the forums…
They have the information…

partho · March 10, 2018, 5:01am

@jeremy @ramesh: I can see that jereme’s fix is on my box. However I still get the same error, which goes away on applying @jakcycsl hack.

Can you please tell me if I need to make changes to my ipynb (perhaps define ‘storage’)?

Or is jeremy’s fix applicable only for models saved after the fix?

heisenburgzero · March 22, 2018, 12:52pm

What’s the intuition behind start training the model at smaller sizes?

Starts faster during early stage?
Less prone to overfit as mentioned in the course

Hope someone can shed some light or point me to somewhere that I can read. Thank you

radek · March 29, 2018, 11:08am

The way I think about it: smaller images allows the model to first learn the overall structure, the big picture. As we increase the size, it starts filling the details within the structure.

This might be easier to do then trying to infer the big picture from a detailed view. And getting the big picture right is what I suppose helps with generalization.

But this is just my intuition. Makes sense to me that we might want our model first to learn shapes and only then the texture. But would take this reasoning with a grain of salt

The main consideration here is whether this layered approach produces better results than training with full image size from start.

jeremy · March 29, 2018, 2:37pm

Yeah that sounds around right. But we don’t have rigorous experiments yet to know whether this gives better accuracy in the end, or speeds up time to get to some accuracy level, or whether it doesn’t really help at all - it’s something that some of the part 2 students are planning experiments for at the moment.

radek · March 29, 2018, 2:40pm

That is really great to hear! Looking forward to the outcomes.

heisenburgzero · March 30, 2018, 5:58am

Thanks for the explanation. I assume this would only make sense if the image is being resize/rescale when setting to different “sz” values. If it’s just doing random cropping by “sz*sz”, it would only be learning part of the image instead of learning the big picture. I’m going to start digging in the fastai transforms source code later.

P.S. Is there a specific reason setting the padding to “sz//8” ?

radek · March 30, 2018, 7:50am

Yes! This is the augmentation (plus random horizontal flipping) that was used in many papers that train on CIFAR10. For instance, the Resnet paper uses this data augmentation.

I played around with this some time ago and this is what the augmented images look like:

Chris_Palmer · April 23, 2018, 3:05am

Any insights about the massive differences in learning rates here, on the first learn.fit of the cifar10 notebook?

Whatever Jeremy is running is taking 40.8 seconds to run, with a batch size of 128. My old GTX 650 Ti with only 2GB RAM, and with batch size reduced to 32 to cope, takes almost 15 minutes!

Of course I intend to upgrade when I can, but what would I expect to get if I chose a 1080 Ti - would it be anywhere near what Jeremy shows I wonder?

Mine:

Jeremy’s output - and it shows CPU times…

jeremy · April 25, 2018, 1:50pm

Should be identical - that’s what I used.

Chris_Palmer · April 25, 2018, 6:00pm

Thanks Jeremy - that’s encouraging to know! My current system can take a 1070 Ti without any further hardware upgrades - how much slower do you think that would be?

heisenburgzero · May 10, 2018, 9:07am

I tried to implement the entire process from scratch with pytorch. Using SNet, the best accuracy I could get is only around 93.2%. Is there any tricks in the fastai library that could push the result even more?

My training process has:

Data augmentation (Random flip, Random Crop and padding)
Weight decay, Momentum
SGDR - implemented using lambdaLR class from pytorch, bad idea but it worked
Snapshot Ensemble - helped a little bit with the results that I submitted to kaggle
Learner class - poor man’s version just to make it easier to change the data size

What else should I be doing? Here’s my notebook. I would really appreciate some input.

github.com

ydixon/deep_learning/blob/master/pytorch_projects/cifar10_pytorch_SGDR.ipynb

{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "%matplotlib inline\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "import matplotlib.pyplot as plt\n",
    "import os\n",
    "import sys\n",
    "import PIL\n",
    "import time\n",
    "from tqdm import tqdm, tnrange, tqdm_notebook\n",
    "\n",
    "import torch\n",
    "import torch.nn as nn\n",

This file has been truncated. show original

Another note, it seems that whether I use differential data size training(is there a term for this technique?) or not, the end results are similar to ones that I trained with 32x32 right from the start(around 93.2%). However, it seems to better at maintaining a better balance between train loss and val loss at early stages. Is there anyway to take advantage of this?

charming · June 3, 2018, 11:17am

I found the senet154 has been added to fastai library, but i don’t konw how to use it.
the original method is not available:
arch = senet154
learn = ConvLearner.pretrained(arch, data)
what should i do?
thanks!

Etienne · June 27, 2019, 5:23pm

Hi ! Really nice work, I am a bit late and I don’t know if the comment will be accurate since the Library evolved a lot since then but now the default loss function for training given by ImageDataBunch is
FlattenedLoss of CrossEntropyLoss() which is a function of Pytorch CrossEntropyLoss that, according to the doc

This criterion combines nn.LogSoftmax() and nn.NLLLoss() in one single class.

so I don’t think the LogSoftmax() activation at the end of your model is necessary because will have 2 activations function, again I don’t know if this is relevant to the previous versions of FastAi but it could maybe help other people with this issue today

Karl_Mason · July 6, 2019, 5:14am

may i ask how to load senet model in fastai v1 ? because i followed your code, it says:
ModuleNotFoundError: No module named ‘fastai.models’