Training a model from scratch: CIFAR 10

zpnc · November 17, 2017, 2:19pm

yes, you’re correct @radek, we’re using also SGDR. To rephrase my question: how many epochs does it take to reach the same precision using SGDR, but starting with 32x32?

I agree with you that it’s easy to test this with current notebook, but there are so many things to do and so little time

radek · November 17, 2017, 2:21pm

Absolutely, I can relate o that

I am not even sure if same results can be achieved without the resizing but would be really cool to see the comparison!

jakcycsl · November 17, 2017, 4:59pm

I modified the code in models/cifar10/senet.py so that we are able to pass in the number classes we want.

Previous: def init(self, block, num_blocks, num_classes=10)
Now: def init(self, block, num_blocks, num_classes)

Previous: def SENet18(): return SENet(PreActBlock, [2,2,2,2])
Now: def SENet18(num_classes=10): return SENet(PreActBlock, [2,2,2,2], num_classes)

It works fine for me after that for the iceberg challenge.

jeremy · November 17, 2017, 5:04pm

Absolutely yes! I was also surprised at how much overfitting this training seemed to handle.

jamesrequa · November 17, 2017, 5:51pm

Did you use the pre-trained version? I tried this same change but it only works when using the model from scratch.

jakcycsl · November 17, 2017, 6:04pm

It didn’t work for me too when I used the pretrained version. Seems that the weights were saved according to the num of classes (10 vs 2).

jamesrequa · November 17, 2017, 6:08pm

Yea interestingly when I used the pre-trained version, without changing the num_classes, while I do get 10 predictions per image but the first two of those predictions corresponding to 0 and 1 seem to be accurate? Even tho its a bit unorthodox I wonder if we could just grab those first 2 predictions and just use those on binary problems like iceberg.

jeremy · November 17, 2017, 6:19pm

I’m going to try to improve this workflow. Although @jamesrequa’s workaround is a nice hack!

ecdrid · November 17, 2017, 6:33pm

got a .241 log loss using resnet 152.

What i find is that the overall loss is decreasing (0.5->.28) on increasing the parameters of

learner.fit()

loss graph

gerardo · November 17, 2017, 7:08pm

@jakcycsl
I noticed that last night
I’m going to try that tonight.

But this hack sounds promising because I already have it implemented

alessa · November 19, 2017, 6:50pm

Did you solve the cuda runtime error? I have the same error

alessa · November 19, 2017, 6:58pm

How do you load the intermediate model?

jamesrequa · November 19, 2017, 7:11pm

First you would need to have saved the model
learn.save('model')

Then you can load that same model back in with the saved weights and pick up training where you left off.
learn.load('model')

resmi · November 19, 2017, 7:20pm

What is stats = (np.array([ 0.4914 , 0.48216, 0.44653]), np.array([ 0.24703, 0.24349, 0.26159])) and what does tfms_from_stats do? I googled a little, but couldn’t figure it out.

alessa · November 19, 2017, 7:30pm

I see, thanks!

I thought you can interrupt the model while training - kill the instance - go to sleep and the reload some temporary model from a temporary file.

binga · November 19, 2017, 7:31pm

When you use a pretrained net for classification, you should use the same mean values of the train set to normalize your data (during prediction step). Often, the means of the images in imagenet are published and you could just use it to normalize your data - which is exactly what is happening here.

github.com

fastai/fastai/blob/master/fastai/transforms.py#L495-L499


from .imports import *
from .layer_optimizer import *
from enum import IntEnum


def scale_min(im, targ):
""" Scales the image so that the smallest axis is of size targ.


Arguments:
    im (array): image
    targ (int): target size
"""
r,c,*_ = im.shape
ratio = targ/min(r,c)
sz = (scale_to(c, ratio, targ), scale_to(r, ratio, targ))
return cv2.resize(im, sz)


def zoom_cv(x,z):
if z==0: return x
r,c,*_ = im.shape
M = cv2.getRotationMatrix2D((c/2,r/2),0,z+1.)

This file has been truncated. show original

jeremy · November 19, 2017, 7:44pm

(In this case - the averages are of the CIFAR10 data, which I just calculated using numpy, before I trained the CIFAR10 model.)

binga · November 19, 2017, 7:47pm

Ah. My bad. CIFAR 10 !

resmi · November 19, 2017, 7:48pm

Thanks, so is it similar to the preprocess_input step in keras?

binga · November 19, 2017, 7:50pm

You’re right. preprocess_input in Keras also does scaling in its series of steps.