Any Ideas on how to modify Resnet with Images that have 2 channels?

@yinterian Thank you for sharing this, this challenge looks really interesting! I just signed up but don’t have access to the dataset yet…

btw I’m loving all the ways ImageClassifierData can load in data in diff formats, in particular from_csv feels “magical” the first time I used it :slight_smile:

@jeremy here is the notebook

you can get rid of the first error by getting the model

mkdir wgts
cd wgts
wget http://webia.lip6.fr/~cadene/Downloads/inceptionv4-97ef9c30.pth

but I am getting an error parsing this model

IndexError                                Traceback (most recent call last)
<timed exec> in <module>()

~/fastai/courses/dl1/fastai/conv_learner.py in pretrained(self, f, data, ps, xtra_fc, xtra_cut, **kwargs)
     86     @classmethod
     87     def pretrained(self, f, data, ps=None, xtra_fc=None, xtra_cut=0, **kwargs):
---> 88         models = ConvnetBuilder(f, data.c, data.is_multi, data.is_reg, ps=ps, xtra_fc=xtra_fc, xtra_cut=xtra_cut)
     89         return self(data, models, **kwargs)
     90 

~/fastai/courses/dl1/fastai/conv_learner.py in __init__(self, f, c, is_multi, is_reg, ps, xtra_fc, xtra_cut)
     31         cut-=xtra_cut
     32         layers = cut_model(self.f(True), cut)
---> 33         self.nf=num_features(layers[-1])*2
     34         layers += [AdaptiveConcatPool2d(), Flatten()]
     35         self.top_model = nn.Sequential(*layers)

~/fastai/courses/dl1/fastai/model.py in num_features(m)
     21     if hasattr(c[-2], 'num_features'): return c[-2].num_features
     22     elif hasattr(c[-2], 'out_features'): return c[-2].out_features
---> 23     return num_features(children(m)[-1])
     24 
     25 

~/fastai/courses/dl1/fastai/model.py in num_features(m)
     17 def num_features(m):
     18     c=children(m)
---> 19     if hasattr(c[-1], 'num_features'): return c[-1].num_features
     20     elif hasattr(c[-1], 'out_features'): return c[-1].out_features
     21     if hasattr(c[-2], 'num_features'): return c[-2].num_features

IndexError: list index out of range
1 Like

@yinterian yes, same for me. I have not investigate this yet, but it looks like
c=children(m) should have returned empty list so c[-1].num_features cant slice a single element of a list.

I’m posting my question in this thread since it discusses the Statoil/C-CORE Iceberg Kaggle challenge.

Summary:

  • I’ve created 3 channel 75x75 images HH, HV and HH/HV and follow lesson 1 steps to train the image classifier
    • set sz to 75
    • I use resnet34
    • precompute activations
    • when searching for optimal learning rate I get the following result

Question 1: What should I make off plot of loss vs. learning rate? Loss starts to increase with learning rate as in dogs vs cats but drops again at high learning rate. If I manually set learning rate to >0.1 in the fit I get bad results. The best I could get was using learning rate 0.003.

Question 2: Sine training set is rather small (~1600 images divide 70:30 between train/valid), is the above drop just fluctuation? Would it make sense to do k-fold cross-validation or data augmentation during this step as well?

  • I continued with remaining steps:
    • Train last layer with data augmentation (i.e. precompute=False) for 2-3 epochs with cycle_len=1
    • Unfreeze all layers
    • Set earlier layers to 3x-10x lower learning rate than next higher layer
    • Train full network with cycle_mult=2 until over-fitting

Question 3: During first iterations the training loss was much higher than the validation one (see below). Why? I didn’t study resnet34 yet in details, but is this connected with Dropout or any other form of regularization used in the model?

Thanks in advance for your insight.

Evolution of train/val loss:

[ 0. 0.24709 0.5274 0.74479]
[ 1. 0.38093 0.51635 0.73958]
[ 2. 0.475 0.49452 0.74479]
[ 3. 0.51119 0.47915 0.76302]
[ 4. 0.52635 0.47595 0.72656]
[ 5. 0.54253 0.45467 0.78125]
[ 6. 0.54397 0.46282 0.76562]
[ 7. 0.54832 0.43534 0.78385]
[ 8. 0.53713 0.4295 0.79688]
[ 9. 0.52633 0.42094 0.79427]
[ 10. 0.54765 0.4208 0.79688]
[ 11. 0.52089 0.41895 0.80469]
[ 12. 0.50348 0.42423 0.78646]
[ 13. 0.50157 0.43473 0.79167]
[ 14. 0.49071 0.40819 0.8099 ]
[ 15. 0.47678 0.41378 0.79948]
[ 16. 0.47423 0.40242 0.78906]
[ 17. 0.47264 0.39497 0.80208]
[ 18. 0.48167 0.3975 0.8151 ]
[ 19. 0.47852 0.38546 0.82812]
[ 20. 0.45687 0.39118 0.79948]
[ 21. 0.47543 0.3832 0.8125 ]
[ 22. 0.47485 0.38656 0.80208]
[ 23. 0.4747 0.38563 0.80208]
[ 24. 0.45017 0.37487 0.8125 ]
[ 25. 0.4352 0.37554 0.82292]
[ 26. 0.41879 0.37579 0.80469]
[ 27. 0.41228 0.37207 0.8125 ]
[ 28. 0.42357 0.3771 0.79948]
[ 29. 0.42048 0.37628 0.78646]
[ 30. 0.43349 0.36884 0.8151 ]

Question 1 : You want to pick the largest learning rate just before your loss increases. Idea here is to optimize the speed without losing model performance. (gives me 0.01)

Question 2: You can decrease the batch size to get more number of iterations since learning rate finder only scans your data ones = 1 epoch. So fluctuation is rather related with having less training data / bs. You can always go with cross validation I assume especially when training data is few.

Question 3: I am not certain of this part but in general as you tune any kind of ML model without overfitting you tend to get closer training and validation scores, the lowest bias-variance. I assume it has to do with initialization of the model in this case and randomness of the train-val split. Since your train-val splits may had been reversed and it wouldn’t matter, which would give the opposite scenario - having much more higher loss in validation than training.

1 Like

Anze,
Since you don’t have that much data I would experiment with not unfreezing all layers (it may be that you are trying to learn way too many parameters). It is very weird that you are not overfitting.

Yes, which is definitely not what you want! You never want to increase the original size of your images, since it wastes compute time with no benefit.

Nothing wrong with using a 7x7 kernel size - I suspect based on this question you may be misunderstanding how kernels work… Maybe you could explain more about what you think might be the problem with this kernel size?

In general, you should set sz to the size of your input images, unless they are too large to handle directly on your GPU.

2 Likes

There are no arguments to those specific things - they are predefined sets of transforms to use as-is. Have a look at how each of those are defined, to see how to configure your own transform list. @yinterian maybe you could show a couple of examples?

Those should all be fixed now. git pull, and grab http://files.fast.ai/models/weights.tgz and untar it into the fastai/fastai folder (i.e. where the .py modules are).

Inception_v4 works fine - in 4 iterations it achieves <0.19 loss on dog breeds competition while resnets cant get better than 0.3. Now I am trying tolearn.predict(is_test=True) and it looks like predict_with_targs function expects to see both x and y no matter is it valid set or test set (model.py line 120)

    preda,targa = zip(*[(get_prediction(m(*VV(x))),y)
                        for *x,y in iter(dl)])

UPDT: ok, its not about x and y, but it throughs an error.

forgot the error message:

~/fastai/courses/dl1/fastai/model.py in predict_with_targs(m, dl)
    119     if hasattr(m, 'reset'): m.reset()
    120     preda,targa = zip(*[(get_prediction(m(*VV(x))),y)
--> 121                         for *x,y in iter(dl)])
    122     return to_np(torch.cat(preda)), to_np(torch.cat(targa))
    123 

TypeError: 'NoneType' object is not iterable

Can you show the code you’re running before that error? Perhaps a screenshot?

@jeremy first 9 cells in

@jeremy Solved. Sorry for bothering, Jeremy - looks like test images were not loaded correctly. Even restarting a notebook did not help. I changed image size - and processed train/test/valid once again and this helped.

@jeremy inception_v4 with simple pre-tuning gives <0.2 here https://www.kaggle.com/c/dog-breed-identification/leaderboard. Thats currently in top 10 out of 200 (first 5 or so guys are using the whole Stanford dataset to build a model just for fun ). And we are just on the “Lesson 1” stage :sunny: .

4 Likes

Thanks for your answers. Reducing the batch size indeed made the plot looking more meaningful.

1 Like

That’s v cool @sermakarevich! Try resnext101_64 too. And then take the average of that and inception

1 Like

BTW as part of fixing resnext et al, I tested resnext50 on dogs v cats, and got 99.75%! We’ll look at how to replicate this result on Monday :slight_smile:

4 Likes

@kcturgutlu, so what was causing the issue? (Im having exactly same problem, even after setting test name)

I blended inception_v4 with my previous model which was NN on top of extracted features from Keras inception and xception with pseudo labelling. So its like a blend of fastai part 1 v1 and part 1 v2 :slight_smile:

UPDT: summary:

model loss accuracy
resnext101_64 0.23 93
inception_v4 0.19 94
(inception_v4 + resnext101_64) / 2 0.183 ~
(inception_v4 + keras inception + keras xception) / 3 0.17 ~
4 Likes