Dog Breed Identification challenge

suvash · November 23, 2017, 7:59pm

yes please @jamesrequa ! Can you somehow send me a link for download ? will then modify the python code to use it.

I think the best going forward might be to upload it to files.fast.ai , and modify the nasnet.py to read from there instead. ping @jeremy

suvash · November 23, 2017, 8:04pm

Ah. I’d opened an issue on the author’s repo and it’s back again now.

https://github.com/Cadene/pretrained-models.pytorch/issues/21

sermakarevich · November 23, 2017, 8:29pm

Try once again, I just checked and it worked for me.

suvash · November 23, 2017, 8:31pm

No worries. The author just put the files back, after I mentioned it on the Github repo issue above, most likely an accident. It works now.

suvash · November 23, 2017, 8:32pm

Also, training Nasnet is slööööööööw !

sermakarevich · November 23, 2017, 8:32pm

Yep, linear regression is waaaay faster

jeremy · November 24, 2017, 1:08am

It’s much faster if you install the latest pytorch from source, FYI.

pierreguillou · November 24, 2017, 1:38pm

EDIT : even the numbers of classes in both the training and validation set are conserved : 120 classes !
get_cv_idxs() = magic function ?

Note : I used the code below to check the number of classes in both training and validation sets :

# training set
unique, counts = np.unique(data.trn_ds.y, return_counts=True)
dict(zip(unique, counts)) 

# validation set
unique, counts = np.unique(data.val_ds.y, return_counts=True)
dict(zip(unique, counts))

Hello,

in the dog breed competition, we use get_cv_idxs() to create randomly a validation set as following :

label_csv = f'{PATH}labels.csv'
n = len(list(open(label_csv)))-1
val_idxs = get_cv_idxs(n)

I wanted to check through histograms the similitude of our training and validation sets after data adaptation to our model through the following code :

tfms = tfms_from_model(arch, sz, aug_tfms=transforms_side_on, max_zoom=1.1)
data = ImageClassifierData.from_csv(PATH, 'train', label_csv, test_name='test',
                                    val_idxs=val_idxs, suffix='.jpg', tfms=tfms, bs=bs)

Below the histograms for row and column sizes under 1000. I’m quite surprised by the perfect similitudes of the training and validation sets (same thing if you take all rows and columns).

Does it means using get_cv_idxs() will ALWAYS give a validation set similar to the training set or we are lucky here ?

pierreguillou · November 24, 2017, 6:46pm

Thank you for your answer @jeremy about small images (under sz size) that are scaled up and for the link to how to make predictions against one image.

About getting the display of a specific image to see the effects of data augmentation on it, I found to change the number num in the following code :

def get_augs():
    data = ImageClassifierData.from_csv(PATH, 'train', label_csv, test_name='test',
                                    val_idxs=val_idxs, suffix='.jpg', tfms=tfms, bs=bs)
    x,_ = next(iter(data.aug_dl))
    return data.trn_ds.denorm(x)[num]

pierreguillou · November 25, 2017, 10:53pm

I would like to know what is your training + testing time on AWS (p2.xlarge) :

for the Dog Breed competition
with the resnext101_64 model
learning rate = 1e-2
validation set = 20% training set
precompute=False but the first fit

Mine is almost 4h… (without the time for learning the lr, trials, etc.). How to reduce it ?

sabzo · November 26, 2017, 5:09am

I’m getting an accuracy of 93% on the breed identification. Now I’m trying to analyze my results (using Lesson 1 lecture as an example). Arguably I know little to nothing about dog breeds – However the following are the correctly classified dog breed yet they don’t look anything alike. Am I analyzing my data correctly?

Thanks.

sabzo · November 26, 2017, 6:14am

I tried the learn.unfreeze() and used a differential learning rate. In my case it didn’t help increase my accuracy.

. Without differential learning rates using resnext_101 I was getting around 93%. Would you have other general suggestions? (I’ve tried all I know…)

sermakarevich · November 26, 2017, 6:26am

Yep they look different because your have 120 classes, not 2. To see why you have incorrectly classified you need to compare real class image with predicted class. They gonna be similar. At least in my case they were very similar.

93% is probably the max what resnext_101 can do on its own.

sabzo · November 26, 2017, 6:46am

Might you have some sample code for that?
For the appropriate models is it just pick various models until I find one with great results? So far I’ve used the models we’ve covered in class, I also used renset34. How can I go about finding models to use?

sermakarevich · November 26, 2017, 6:49am

Sorry, I don`t think I have. I had 5 predictions from 5 different models and was checking manually random pics when observed 3 predictions of one class and 2 predictions of another.

When we are newbies I think thats how things work. In this specific case you can simply select a model with best accuracy on imagenet and it should have the best accuracy on dog breeds which is a subset of imagenet images.

jeremy · November 26, 2017, 3:30pm

Check over on #part1v2-beg - there’s some code there contributed by @alessa

z0k · November 27, 2017, 12:54pm

Hi @vikbehal,

I have only submitted a couple of simple baseline models for the Dog Breeds challenge, so I would ask @bushaev for advice as it’s been all him so far

EricPB · November 28, 2017, 1:10pm

For those using Nasnet, did you encounter the following error in the “Precompute” section, when running learn = ConvLearner.pretrained(arch, data, precompute=True) ?

I’m getting a “ValueError: array trailing dimensions do not match with self” and it’s not showing up anything related on Google so far.

The same notebook works with Resnext101.

sermakarevich · November 28, 2017, 1:36pm

One of differences between nasnet and resnext is that for nasnet you need to define:

def nasnet(pre): return nasnetalarge(pretrained = 'imagenet' if pre else None)
model_features[nasnet]=4032*2
stats = ([0.5, 0.5, 0.5], [0.5, 0.5, 0.5])
tfms = tfms_from_stats(stats, sz, aug_tfms=transforms_side_on, max_zoom=1.1)

EricPB · November 28, 2017, 6:08pm

@sermakarevich Thanks a lot for the hints.

I’d like to learn to fix those errors on my own: could you share how/where you found out this solution ?