Dog Breed Identification challenge

(Suvash) #304

yes please @jamesrequa ! :bowing_man: Can you somehow send me a link for download ? will then modify the python code to use it.

I think the best going forward might be to upload it to , and modify the to read from there instead. ping @jeremy

(Suvash) #305

Ah. I’d opened an issue on the author’s repo and it’s back again now.

(sergii makarevych) #306

Try once again, I just checked and it worked for me.

(Suvash) #307

No worries. The author just put the files back, after I mentioned it on the Github repo issue above, most likely an accident. It works now.

(Suvash) #308

Also, training Nasnet is slööööööööw !

(sergii makarevych) #309

Yep, linear regression is waaaay faster :joy:

(Jeremy Howard) #310

It’s much faster if you install the latest pytorch from source, FYI.

(Pierre Guillou) #311

EDIT : even the numbers of classes in both the training and validation set are conserved : 120 classes !
get_cv_idxs() = magic function ? :slight_smile:

Note : I used the code below to check the number of classes in both training and validation sets :

# training set
unique, counts = np.unique(data.trn_ds.y, return_counts=True)
dict(zip(unique, counts)) 

# validation set
unique, counts = np.unique(data.val_ds.y, return_counts=True)
dict(zip(unique, counts)) 


in the dog breed competition, we use get_cv_idxs() to create randomly a validation set as following :

label_csv = f'{PATH}labels.csv'
n = len(list(open(label_csv)))-1
val_idxs = get_cv_idxs(n) 

I wanted to check through histograms the similitude of our training and validation sets after data adaptation to our model through the following code :

tfms = tfms_from_model(arch, sz, aug_tfms=transforms_side_on, max_zoom=1.1)
data = ImageClassifierData.from_csv(PATH, 'train', label_csv, test_name='test',
                                    val_idxs=val_idxs, suffix='.jpg', tfms=tfms, bs=bs)

Below the histograms for row and column sizes under 1000. I’m quite surprised by the perfect similitudes of the training and validation sets (same thing if you take all rows and columns).

Does it means using get_cv_idxs() will ALWAYS give a validation set similar to the training set or we are lucky here ?

(Pierre Guillou) #312

Thank you for your answer @jeremy about small images (under sz size) that are scaled up and for the link to how to make predictions against one image.

About getting the display of a specific image to see the effects of data augmentation on it, I found to change the number num in the following code :

def get_augs():
    data = ImageClassifierData.from_csv(PATH, 'train', label_csv, test_name='test',
                                    val_idxs=val_idxs, suffix='.jpg', tfms=tfms, bs=bs)
    x,_ = next(iter(data.aug_dl))
    return data.trn_ds.denorm(x)[num]

(Pierre Guillou) #313

I would like to know what is your training + testing time on AWS (p2.xlarge) :

  • for the Dog Breed competition
  • with the resnext101_64 model
  • learning rate = 1e-2
  • validation set = 20% training set
  • precompute=False but the first fit

Mine is almost 4h… (without the time for learning the lr, trials, etc.). How to reduce it ?

(Sabelo Mhlambi) #314

I’m getting an accuracy of 93% on the breed identification. Now I’m trying to analyze my results (using Lesson 1 lecture as an example). Arguably I know little to nothing about dog breeds – However the following are the correctly classified dog breed yet they don’t look anything alike. Am I analyzing my data correctly?


(Sabelo Mhlambi) #315

I tried the learn.unfreeze() and used a differential learning rate. In my case it didn’t help increase my accuracy.

. Without differential learning rates using resnext_101 I was getting around 93%. Would you have other general suggestions? (I’ve tried all I know…)

(sergii makarevych) #316

Yep they look different because your have 120 classes, not 2. To see why you have incorrectly classified you need to compare real class image with predicted class. They gonna be similar. At least in my case they were very similar.

93% is probably the max what resnext_101 can do on its own.

(Sabelo Mhlambi) #317
  1. Might you have some sample code for that?
  2. For the appropriate models is it just pick various models until I find one with great results? So far I’ve used the models we’ve covered in class, I also used renset34. How can I go about finding models to use?

(sergii makarevych) #318

Sorry, I don`t think I have. I had 5 predictions from 5 different models and was checking manually random pics when observed 3 predictions of one class and 2 predictions of another.

When we are newbies I think thats how things work. In this specific case you can simply select a model with best accuracy on imagenet and it should have the best accuracy on dog breeds which is a subset of imagenet images.

(Jeremy Howard) #319

Check over on #part1v2-beg - there’s some code there contributed by @alessa

(Zarak) #320

Hi @vikbehal,

I have only submitted a couple of simple baseline models for the Dog Breeds challenge, so I would ask @bushaev for advice as it’s been all him so far :slight_smile:

(Eric Perbos-Brinck) #321

For those using Nasnet, did you encounter the following error in the “Precompute” section, when running learn = ConvLearner.pretrained(arch, data, precompute=True) ?

I’m getting a “ValueError: array trailing dimensions do not match with self” and it’s not showing up anything related on Google so far.

The same notebook works with Resnext101.

(sergii makarevych) #322

One of differences between nasnet and resnext is that for nasnet you need to define:

def nasnet(pre): return nasnetalarge(pretrained = 'imagenet' if pre else None)
stats = ([0.5, 0.5, 0.5], [0.5, 0.5, 0.5])
tfms = tfms_from_stats(stats, sz, aug_tfms=transforms_side_on, max_zoom=1.1)

(Eric Perbos-Brinck) #323

@sermakarevich Thanks a lot for the hints.

I’d like to learn to fix those errors on my own: could you share how/where you found out this solution ?