Dog Breed Identification challenge

The progress bar bug seems to hit me on my home machine whenever I run code that uses it. It works the first time, then fails on any following cells. The progress prints out a million statements and then the training code just stops running (GPU at 0% usage).

Has anyone else experienced this happening frequently?

Resetting the kernel often isn’t much fun :-/

This happens often after I use lr_find. And always happens after an exception. I don’t see the problem other than those two situations.

I managed to jump from 85% accuracy to 93.5% accuracy just by switching from resnet34 -> resnext101_64!

2 Likes

You can jump to >95% accuracy by using nasnet.

8 Likes

How long did it take to train?

1 Like

Don`t remember exactly but something like ~1-2 hours to precompute activations, and than with this approach + 30-60 minutes for 5 folds cross validation.

1 Like

Ohh smart…so you precomputed activations for each of the folds in your 5 fold cv?

Good thing for dog breed we aren’t finetuning…I bet it would take forever to run it with learn.unfreeze()

2 Likes

And you got this result without data aug?

I precomputed them once: for train (train - 1 image), validation [1 image] and test. And than just changed indexes and never precompute activations once again.

In dogs vs cats competitions Bojan (the winner) told his best model was training ~ 1 week.

UPDTL @lgvaz it is a weighted average of resnet101_64, inception_v4 and nasnet. with each model i predicted with 5 fold cv with 5 different seeds (75 models in total).

3 Likes

OK I think I got it now and you were able to do this by following your steps you posted in this thread linked below right?

So basically you precomputed activations on all of the data (except one image) and then you just changed the indexes to split up train/validation sets…right? Although you had to create a custom function to be able to do this in fastai it seems :slight_smile:

Almost exactly right, except for this:

I joined it back to train :yum:

data_x = np.vstack([val_act, act])

2 Likes

Oh I missed that haha, amazing, good work!!

That’s really a lot of models… haha

I saw you talking in another topic about essembling methods, I tried to use logistic regression on top of my classifiers but it went really really bad… Anyways, you are calculating the weights based on the CV loss?

Based on loss on my train set.

Super. Was planning to use this for the next batch of uninterrupted time I get. Were you able to use Nasnet right of the example shared by Jeremy, or any gotchas there ?

From Jeremy`s example. It should be as simple as with any other model. Just a bit longer :yum:

1 Like

Hi! So has anyone tried not removing last fc layer ?
I tried it with resnet34 and Inception_4, but the results aren’t so great and I have a couple of questions.

  1. resnet34 gives 1000 probabilities for each example which is what i would expect, however, inception_4 gives 1536 of those, why is that ?
  2. resnet34 gave horrible results really, with inception_4 I got .93 accuracy and 0.23 log_loss on validation set which is pretty good but does not really compare to result gotten from people in top 10. What can I do to improve results ?
    Some of my thoughts as well.
    I guess i can combine predictions from different models to get better results(could it help ?)
    I trained logistic regression on top of predictions, would MLP work better ?

I am trying to do a last step fit with the entire training set before having the model predict the test set. I tried to simply not setting val_idxs when calling from_csv but when I call get_data I get the below error.

I see in from_csv definition the default value is None. I have tried passing an empty numpy array and I get the same error.


IndexError                                Traceback (most recent call last)
<ipython-input-38-464872a660af> in <module>()
----> 1 data = get_data(sz, bs)

<ipython-input-37-7d8c25715bc7> in get_data(sz, bs)
      1 def get_data(sz, bs):
      2     tfms = tfms_from_model(arch, sz, aug_tfms=transforms_side_on, max_zoom=1.1)
----> 3     data = ImageClassifierData.from_csv(PATH, 'train' ,f'{PATH}labels.csv', test_name='test', val_idxs=[], suffix='.jpg', tfms=tfms, bs=bs)
      4     return data if sz > 300 else data.resize(340, 'tmp')

~/fastai/courses/dl1/fastai/dataset.py in from_csv(cls, path, folder, csv_fname, bs, tfms, val_idxs, suffix, test_name, continuous, skip_header, num_workers)
    348         """
    349         fnames,y,classes = csv_source(folder, csv_fname, skip_header, suffix, continuous=continuous)
--> 350         ((val_fnames,trn_fnames),(val_y,trn_y)) = split_by_idx(val_idxs, np.array(fnames), y)
    351 
    352         test_fnames = read_dir(path, test_name) if test_name else None

~/fastai/courses/dl1/fastai/dataset.py in split_by_idx(idxs, *a)
    361 def split_by_idx(idxs, *a):
    362     mask = np.zeros(len(a[0]),dtype=bool)
--> 363     mask[np.array(idxs)] = True
    364     return [(o[mask],o[~mask]) for o in a]

IndexError: arrays used as indices must be of integer (or boolean) type

You can use val_idxs=[0] to get rid of that error and have only one image in your validation set

1 Like

Looks like inception_4 is doing something different…

def inception_4(pre):
    return children(load_pre(pre, InceptionV4, 'inceptionv4-97ef9c30'))[0]

instead of…

def resnext50(pre): return load_pre(pre, resnext_50_32x4d, 'resnext_50_32x4d')

I changed inception_4 to:
def inception_test(pre): return load_pre(pre, InceptionV4, 'inceptionv4-97ef9c30')
and got 1001 outputs

1 Like