Dog Breed Identification challenge

sermakarevich · November 21, 2017, 6:24pm

Based on loss on my train set.

suvash · November 21, 2017, 6:27pm

Super. Was planning to use this for the next batch of uninterrupted time I get. Were you able to use Nasnet right of the example shared by Jeremy, or any gotchas there ?

sermakarevich · November 21, 2017, 6:29pm

From Jeremy`s example. It should be as simple as with any other model. Just a bit longer

bushaev · November 21, 2017, 7:22pm

Hi! So has anyone tried not removing last fc layer ?
I tried it with resnet34 and Inception_4, but the results aren’t so great and I have a couple of questions.

resnet34 gives 1000 probabilities for each example which is what i would expect, however, inception_4 gives 1536 of those, why is that ?
resnet34 gave horrible results really, with inception_4 I got .93 accuracy and 0.23 log_loss on validation set which is pretty good but does not really compare to result gotten from people in top 10. What can I do to improve results ?
Some of my thoughts as well.
I guess i can combine predictions from different models to get better results(could it help ?)
I trained logistic regression on top of predictions, would MLP work better ?

connelly · November 21, 2017, 7:28pm

I am trying to do a last step fit with the entire training set before having the model predict the test set. I tried to simply not setting val_idxs when calling from_csv but when I call get_data I get the below error.

I see in from_csv definition the default value is None. I have tried passing an empty numpy array and I get the same error.

IndexError                                Traceback (most recent call last)
<ipython-input-38-464872a660af> in <module>()
----> 1 data = get_data(sz, bs)

<ipython-input-37-7d8c25715bc7> in get_data(sz, bs)
      1 def get_data(sz, bs):
      2     tfms = tfms_from_model(arch, sz, aug_tfms=transforms_side_on, max_zoom=1.1)
----> 3     data = ImageClassifierData.from_csv(PATH, 'train' ,f'{PATH}labels.csv', test_name='test', val_idxs=[], suffix='.jpg', tfms=tfms, bs=bs)
      4     return data if sz > 300 else data.resize(340, 'tmp')

~/fastai/courses/dl1/fastai/dataset.py in from_csv(cls, path, folder, csv_fname, bs, tfms, val_idxs, suffix, test_name, continuous, skip_header, num_workers)
    348         """
    349         fnames,y,classes = csv_source(folder, csv_fname, skip_header, suffix, continuous=continuous)
--> 350         ((val_fnames,trn_fnames),(val_y,trn_y)) = split_by_idx(val_idxs, np.array(fnames), y)
    351 
    352         test_fnames = read_dir(path, test_name) if test_name else None

~/fastai/courses/dl1/fastai/dataset.py in split_by_idx(idxs, *a)
    361 def split_by_idx(idxs, *a):
    362     mask = np.zeros(len(a[0]),dtype=bool)
--> 363     mask[np.array(idxs)] = True
    364     return [(o[mask],o[~mask]) for o in a]

IndexError: arrays used as indices must be of integer (or boolean) type

bushaev · November 21, 2017, 7:30pm

You can use val_idxs=[0] to get rid of that error and have only one image in your validation set

metachi · November 21, 2017, 7:35pm

Looks like inception_4 is doing something different…

def inception_4(pre):
    return children(load_pre(pre, InceptionV4, 'inceptionv4-97ef9c30'))[0]

instead of…

def resnext50(pre): return load_pre(pre, resnext_50_32x4d, 'resnext_50_32x4d')

I changed inception_4 to:
def inception_test(pre): return load_pre(pre, InceptionV4, 'inceptionv4-97ef9c30')
and got 1001 outputs

A_TF57 · November 21, 2017, 7:37pm

I started with resnet34 and realized it gave me really bad results. I then moved to resnext101_64 which was better (around 94% accuracy and close to 0.25 log loss).

Later, I switched to inception_v4 and after training for some time, experimenting with different sizes, the accuracy jumped to ~95% and the log loss dropped to 0.18.

I took average of the probabilities of these 2 models and submitted, with a final log loss of 0.159! So, ensembling, or in this case simple averaging helped me a lot.

Would love to build on this further and try nasnet next.

bushaev · November 21, 2017, 8:07pm

I tying to get predictions from pretrained resnext101_64 model, but i’m getting a weird error

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-17-3bcdb70995d3> in <module>()
----> 1 resnext_preds = get_model_predictions(resnext101_64, 'resnet')

<ipython-input-14-2f9714fa7f7d> in get_model_predictions(f_model, name)
      6     bm = BasicModel(m.cuda(), name=name)
      7     learn = ConvLearner(data, bm)
----> 8     predictions, y = learn.TTA()
      9     save_array('predictions_' + name, predictions)
     10     return predictions, y

~/fastai/fastai/learner.py in TTA(self, n_aug, is_test)
    167         dl1 = self.data.test_dl     if is_test else self.data.val_dl
    168         dl2 = self.data.test_aug_dl if is_test else self.data.aug_dl
--> 169         preds1,targs = predict_with_targs(self.model, dl1)
    170         preds1 = [preds1]*math.ceil(n_aug/4)
    171         preds2 = [predict_with_targs(self.model, dl2)[0] for i in tqdm(range(n_aug), leave=False)]

~/fastai/fastai/model.py in predict_with_targs(m, dl)
    115     if hasattr(m, 'reset'): m.reset()
    116     res = []
--> 117     for *x,y in iter(dl): res.append([get_prediction(m(*VV(x))),y])
    118     preda,targa = zip(*res)
    119     return to_np(torch.cat(preda)), to_np(torch.cat(targa))

~/src/anaconda3/envs/fastai/lib/python3.6/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    222         for hook in self._forward_pre_hooks.values():
    223             hook(self, input)
--> 224         result = self.forward(*input, **kwargs)
    225         for hook in self._forward_hooks.values():
    226             hook_result = hook(self, input, result)

~/src/anaconda3/envs/fastai/lib/python3.6/site-packages/torch/nn/modules/container.py in forward(self, input)
     65     def forward(self, input):
     66         for module in self._modules.values():
---> 67             input = module(input)
     68         return input
     69 

~/src/anaconda3/envs/fastai/lib/python3.6/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    222         for hook in self._forward_pre_hooks.values():
    223             hook(self, input)
--> 224         result = self.forward(*input, **kwargs)
    225         for hook in self._forward_hooks.values():
    226             hook_result = hook(self, input, result)

~/src/anaconda3/envs/fastai/lib/python3.6/site-packages/torch/nn/modules/container.py in forward(self, input)
     65     def forward(self, input):
     66         for module in self._modules.values():
---> 67             input = module(input)
     68         return input
     69 

~/src/anaconda3/envs/fastai/lib/python3.6/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    222         for hook in self._forward_pre_hooks.values():
    223             hook(self, input)
--> 224         result = self.forward(*input, **kwargs)
    225         for hook in self._forward_hooks.values():
    226             hook_result = hook(self, input, result)

~/src/anaconda3/envs/fastai/lib/python3.6/site-packages/torch/nn/modules/linear.py in forward(self, input)
     51 
     52     def forward(self, input):
---> 53         return F.linear(input, self.weight, self.bias)
     54 
     55     def __repr__(self):

~/src/anaconda3/envs/fastai/lib/python3.6/site-packages/torch/nn/functional.py in linear(input, weight, bias)
    551     if input.dim() == 2 and bias is not None:
    552         # fused op is marginally faster
--> 553         return torch.addmm(bias, input, weight.t())
    554 
    555     output = input.matmul(weight.t())

~/src/anaconda3/envs/fastai/lib/python3.6/site-packages/torch/autograd/variable.py in addmm(cls, *args)
    922         @classmethod
    923         def addmm(cls, *args):
--> 924             return cls._blas(Addmm, args, False)
    925 
    926         @classmethod

~/src/anaconda3/envs/fastai/lib/python3.6/site-packages/torch/autograd/variable.py in _blas(cls, args, inplace)
    918             else:
    919                 tensors = args
--> 920             return cls.apply(*(tensors + (alpha, beta, inplace)))
    921 
    922         @classmethod

~/src/anaconda3/envs/fastai/lib/python3.6/site-packages/torch/autograd/_functions/blas.py in forward(ctx, add_matrix, matrix1, matrix2, alpha, beta, inplace)
     24         output = _get_output(ctx, add_matrix, inplace=inplace)
     25         return torch.addmm(alpha, add_matrix, beta,
---> 26                            matrix1, matrix2, out=output)
     27 
     28     @staticmethod

RuntimeError: size mismatch at /opt/conda/conda-bld/pytorch_1503965122592/work/torch/lib/THC/generic/THCTensorMathBlas.cu:243

here’s the code I’m using

def get_model_predictions(f_model, name):
    tfms = tfms_from_model(f_model, sz)
    data = ImageClassifierData.from_csv(PATH, 'train', f'{PATH}/labels.csv',bs=bs, tfms=tfms, val_idxs=idx, test_name='test', 
                                        suffix='.jpg')
    m = f_model(True)
    bm = BasicModel(m.cuda(), name=name)
    learn = ConvLearner(data, bm)
    predictions, y = learn.TTA()
    save_array('predictions_' + name, predictions)
    return predictions, y

this works for resnet34, inception_4, resnext50 but not for resnext101_64 or resnext101
I tried removing tmp folder but it doesn’t help

bushaev · November 23, 2017, 12:08am

Is everyone with 0.1 loss and less just have been using the whole Stanford Dogs Dataset for training ?
Is this considered cheating ?

pierreguillou · November 23, 2017, 12:24pm

Hello,

through Data Augmentation, I’m wondering what happens to images with a row and/or a column size inferior to 299 when we set sz=299 ?

Why ? because in the train set of the Dog Breed competition, 10% of images have a row size < 299.

To watch a SPECIFIC image AFTER data augmentation, I’m adapting the following code from the “Dogs vs Cats” jupyter notebook :

sz = 299
bs= 64

tfms = tfms_from_model(arch, sz, aug_tfms=transforms_side_on, max_zoom=1.1)

def get_augs():
    data = ImageClassifierData.from_csv(PATH, 'train', label_csv, test_name='test',
                                    val_idxs=val_idxs, suffix='.jpg', tfms=tfms, bs=bs)
    x,_ = next(iter(data.aug_dl))
    return data.trn_ds.denorm(x)[1]

ims = np.stack([get_augs() for i in range(6)])

plots(ims, rows=2)

This code works well but how to use it against a specific image ?

sermakarevich · November 23, 2017, 2:29pm

A moment of glory for guys - nice feeling to be in top . I just wonder why not absolute 0?

vikbehal · November 23, 2017, 2:33pm

9 hours straight on p2.xlarge with inception_4 and resent101_64 without unfreeze!
multiclass_loss:

inception_4: 0.19768 - Position 50
resnet101_64: worse than inception_4 - Position No improvement
emsemble: 0.17038 - Position 29

#ensemble is bliss.

I can see many fastai students with the best of @sermakarevich with loss of 0.13987, @jamesrequa 0.15319, @thiago .15421, @rikiya .15387, @A_TF57 .15945, @lgvaz .16052, @suvash .16059, @z0k, @bushaev .16094, @rishubhkhurana .16567. Guys, apart from multiple epochs, what would you recommend to improve the score?

Thank you everyone for inspiring enough!

sermakarevich · November 23, 2017, 2:41pm

Congrats, thats nice score. Tip for score improvement - just read this thread, everything you need is already here. Both how to get < 0.14 and even how to get <0.001 .

vikbehal · November 23, 2017, 2:53pm

Using completely different data-set for training?

rishubhkhurana · November 23, 2017, 5:20pm

I haven’t visited this competition in a while but I guess you could try training on entire data set (not just on 80% of it), if you haven’t done so. And create different models with different sz parameter, especially inception net. Then, ensemble them.

jeremy · November 23, 2017, 6:34pm

It simply scales it up (with linear interpolation).

To apply to one image, see How do we use our model against a specific image?

suvash · November 23, 2017, 7:38pm

Seems like the pretrained data for nasnet (pytorch) is not available on the servers right now, the ones defined here.

https://github.com/fastai/fastai/blob/master/fastai/models/nasnet.py#L10

Somehow it’s been removed upstream.

https://github.com/Cadene/pretrained-models.pytorch/blob/master/pretrainedmodels/nasnet.py#L10

Any other ways to get a hold of this file ? @sermakarevich maybe you could share the ones cached on your machine ? ( it’s downloaded to $HOME/.torch/models/nasnetalarge-dc8c1432.pth )

suvash · November 23, 2017, 7:42pm

congrats there ! regarding the techniques, what @sermakarevich said

I’ve only ensembled a bunch of various models. Haven’t had time for anything more.

jamesrequa · November 23, 2017, 7:55pm

I have the nasnet weights too if you still need them