Training a model from scratch: CIFAR 10

abi · November 16, 2017, 11:39pm

What path are we suppose to store all the pre-trained model weights?

ramesh · November 17, 2017, 12:05am

Fastai’s pre-trained model weights are at fastai/weights but some of the models are from PyTorch’s pre-trained like VGG, Resnet etc, they are stored at ~/.torch/models/

jamesrequa · November 17, 2017, 1:05am

For SENet its a bit different from all the other pre-trained weights which are automatically loaded from fastai/weights directory, since for this model we are actually using learn.load to load the weights in so you’ll need to place that particular weight file in your data directory in the models folder.

jeremy · November 17, 2017, 1:45am

You’ll actually want it to be in the same place as your other model files for that dataset, which is f'(PATH}models'.

jamesrequa · November 17, 2017, 1:53am

Yes exactly, thats what I was trying to say I just wanted to point out that this is the not the same location for the weights from other pre-trained ImageNet models like inception4, resnext, etc. since those are all loaded in from fastai/weights. (of course after you train yourself those new finetuned weights end up in f'(PATH}models' )

gerardo · November 17, 2017, 2:42am

@radek No magic here

from fastai.models.cifar10.senet import SENet18

m = SENet18()
bm = BasicModel(m.cuda(), name=‘cifar10_SENet18’)

data = ImageClassifierData.from_paths(PATH, tfms=tfms_from_model(m, sz))
#data = ImageClassifierData.from_paths(PATH)

learn = ConvLearner(data, bm)
learn.unfreeze()
learn.load(‘sen_32x32_8’)

learn.fit(0.01, 5)

@zpnc No num_classes enable on the SENet18()

This is predicting 10 classes

Out[24]:
array([[ -0.03931, -3.25638, -11.64748, -13.02063, -13.74167, -14.32404, -13.12414, -16.31706, -15.34966,
-13.79075],
[ -0.00171, -6.37512, -17.67231, -21.87936, -24.85791, -24.22245, -22.4937 , -26.0106 , -24.5726 ,
-23.42298],
[ -0.00004, -10.18968, -25.22412, -33.11356, -37.17604, -35.85092, -32.23149, -38.18449, -36.37085,
-34.89988],
[ -0.00043, -7.74531, -16.0502 , -23.37728, -22.80679, -24.57661, -22.05182, -25.2618 , -23.55353,
-23.559 ],
[ -0.18095, -1.79912, -9.98963, -12.32735, -12.66292, -13.37598, -10.96291, -14.3664 , -13.65081,
-11.73629],
[ -0.00154, -6.4734 , -17.16404, -25.36037, -26.31315, -26.37898, -23.26433, -26.81647, -23.59966,
-24.0616 ],
[ -0.00007, -9.62542, -25.56021, -32.92747, -35.47162, -34.49596, -31.37358, -36.26967, -33.80552,
-32.72007],
[ -0.00002, -10.69174, -23.49642, -33.88793, -34.70346, -35.1624 , -31.05718, -36.33319, -32.74025,
-33.47608],
[ -0.00613, -5.09705, -13.74302, -17.84868, -18.88378, -19.37073, -16.35867, -20.6087 , -19.64924,
-18.3053 ],
[ -0.08595, -2.49756, -10.14199, -12.39996, -12.31104, -13.47758, -12.54866, -13.82268, -12.55679,
-11.21001]], dtype=float32)

radek · November 17, 2017, 6:02am

Hey @gerardo - I am sorry, you were right and I was wrong. Apparently, there are many ways to skin a cat using the fastai library.

The difference is in how our model gets constructed here - apparently there is such a class as BasicModel and by instantiating it, we are doing part of the job that calling ConvLearner.pretrained does for us when doing transfer learning…

The way to solve this (I think - and I can’t look at the code right now so speaking from memory) is looking at ConvBuilder in learner.py and redoing the relevant steps on our model from its init function. What we are doing here is way cooler than in part 1 v1 as there is an adaptive pooling layer (@jeremy’s invention? a very neat thing indeed! ) that we slap onto the convolutional part of the model - it is a concatenation of doing avg pool and max pool on the earlier layer, so we get the best of both worlds!

I cannot look at this right now and have one other notebook I need to finish one thing on, but the moment I jump back to using the fastai library, the deep learning part - which should be either tonight hopefully or tomorrow at the latest - I will figure this out and post here unless someone else beats me to it Or maybe the steps in learner.py that I reference would suffice to reproduce this?

Anyhow - if this gets cleared up until I get a chance to figure this out myself and post here, please let me know Otherwise, I will correct the incorrect answer I gave you last time by posting the steps how to change the model as soon as feasibly can.

jamesrequa · November 17, 2017, 7:37am

I tried using the SENet weight file that was pre-trained on CIFAR10 but couldn’t get it to work…so of course I decided to train it from scratch Its really cool/awesome to run through Jeremy’s notebook just to see the progression from like 2 loss on 8x8 images up to state of the art results! If anyone has the time I would recommend doing it, I think it helps build up an “intuition”

Anyway, if anyone runs into similar issues with that weight file feel free to reach out, you can also try using mine

zpnc · November 17, 2017, 8:50am

I’m in the process of training the resnext 29 8 64 from scratch to get experience and intuition for cases (tasks) when no-one will provide the almost perfect weights

The process takes time and I have to ‘interrupt’ it to stop the instance and get some sleep. @jeremy do you get any sleep at all? The amount of work that you do is just amazing

But seriously: if you have to ‘interrupt’ the process, don’t forget next time you load your intermediate model to unfreeze it before you continue with training. In addition, be sure also to set the data set with correct image size.

zpnc · November 17, 2017, 11:51am

@jeremy towards the end of your training the resnext model in cifar10.ipynb (image size 32x32) the model looks like it is severely overfitting (training loss <0.10; validation loss ~ between 0.2 - 0.3), but you kept on fitting. Does this mean that we should ignore overfitting as long as the validation loss is reducing?

ramesh · November 17, 2017, 1:01pm

Here’s my CIFAR10 from scratch with a Custom PyTorch Model. Based on Examples from Jeremy, Yannet and my fellow learners

Appreciate any feedback. Goal was not to create the best model, but to show yet another example of how you could create a custom model and integrate with the Learner.

gist.github.com

https://gist.github.com/sampathweb/431a47a57cf71dd39a881ebf6abbaace

cifar10-fastai-custom-model-scratch.ipynb

{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Put these at the top of every notebook, to get automatic reloading and inline plotting\n",
    "%reload_ext autoreload\n",

This file has been truncated. show original

radek · November 17, 2017, 1:09pm

@gerardo I think that this is what you may have been looking for Great stuff @ramesh, thx for sharing this

ramesh · November 17, 2017, 1:19pm

We have feature requests to add Early Stopping and Model Checkpoints to prevent overfitting. You can find more here, under the Learner Fit Options - Wiki: Fastai Library Feature Requests)

I would like to know the thoughts on when to stop the model fitting process.

binga · November 17, 2017, 1:34pm

Maybe have an early_stopping_rounds (in XGBoost) / patience (in Keras) kind of a parameter that terminates the training process once the validation loss doesn’t decrease after so many rounds / epochs? This seems to work more often than not on many problems and many people are familiar with this API too. Do you suggest any better approaches?

EDIT: loss -> validation loss.

radek · November 17, 2017, 1:37pm

I think this is coming in the course when we discuss creating our own parts of models or entire models. I started typing out a very long response again, but the way @jeremy walks through the methodology of going about this (starting with overfitting and then working towards a really well trained model) is imho outstanding

I would not worry about overfitting as such just yet and in general would defer to @jeremy with regards to what the best course of action should be here.

BTW a gentleman who goes by the nick @smerity on Twitter wrote this super awesome post on model architecture and simplicity and I think if you combine the walkthrough of the methodology with the contents of this article, you will be in deep learning practitioner paradise

radek · November 17, 2017, 1:55pm

Which loss do you look at not decreasing? With overfitting you will have a U shaped loss on your validation set. And train set loss will stop decreasing only when you overfit.

But if you stop there at the bottom of the validation loss valley, this is also not great - you are completely skipping all the goodies that SGDR gives us in terms of finding a nice spot in the weight space.

But the main concern is - if you do early stopping, how do you know it is the best the model could get? It’s sort of like keeping your fingers crossed that somehow training this model of incompatible architecture, if you stop early, you will somehow arrive at the best answer you could on a different but related question that the model was not designed to answer

What @jeremy has for us is 1000x better imho and I never came across this methodology being shared anywhere outside of fastai I hear it is known in deep learning community but literally heard about it first from @jeremy and never heard about it anywhere else.

zpnc · November 17, 2017, 2:02pm

@ramesh This looks great! Thanks for sharing.

binga · November 17, 2017, 2:08pm

Well, I agree with you. This line makes a lot of sense!

zpnc · November 17, 2017, 2:11pm

Has anyone tried to to reach >0.9 accuracy by starting with 32x32 images from the beginning? I would like to understand what does it take (how many epochs, learning rate schedule, etc.) to get to the same accuracy as we do now (start training with 8x8, move to 16x16, …, and so on) with ~40-50 epochs in total. If I remember correctly the authors of cifar10 resnext model trained for 300 epochs. To me it looks that by gradually increasing the image size the model is learning faster.

radek · November 17, 2017, 2:15pm

We also use SGDR which is know for speeding things up quite a bit

Nonetheless such comparison would be cool I think this could be even pulled off using this notebook but not using the cycles, etc and comparing this also vs not resizing the images.