Dog Breed Identification challenge

How do I check the most correct predictions for each breed while analysing the results?

In the dogs vs cats it was a classification problem - and I could call this line as I knew 1 was a dog.

plot_val_with_title(most_by_correct(1, True), "Most correct dogs")

But wondering how to check when having multiple outputs - and also how to display the title for each.

Thanks.

@jeremy Thanks, got it! Actually I saw bunch of zeros as 2nd item, now totally make sense :slight_smile:

You’ll need to write that code yourself - it would be a great exercise, actually. Let us know if you try and get stuck!

1 Like

Maybe a silly question. Why do we need to take np.exp of the prediction?

That gives us the probabilities.

In the below code, why are we choosing 300 as the default size value to check if condition?

def get_data(sz, bs):
    tfms = tfms_from_model(arch, sz, aug_tfms=transforms_side_on, max_zoom = 1.1)
    data = ImageClassifierData.from_csv(PATH, 'train', f'{PATH}labels.csv', 
                                    test_name ='test', val_idxs=val_idxs, suffix='.jpg', tfms=tfms, bs=bs)
    return data if sz>300 else data.resize(340, 'tmp')
3 Likes

Most pytorch models return the log of the probabilities.

2 Likes

Great question. Since we have max_zoom=1.1, I figured we should ensure our images are at release sz*1.1. And I figured resizing them to 340x340 would save plenty of time, and leave plenty of room to experiment.

5 Likes

I created this Kernel so I could share some code here:

https://www.kaggle.com/kevinbird15/breaking-data-into-train-and-valid

This really helped me quickly break my train folder into train and valid and also broke everything into folders.

It didn’t really work great as a Kernel, but I didn’t know where else to share it. Hopefully it isn’t too annoying in their list of Kernels.

1 Like

If you use from_csv like we did in the last lesson you can skip all that :slight_smile:

1 Like

Hi @jeremy, just qq - what method allows not to use validation set at all?

1 Like

Any method should allow that. Just don’t set val_name in from_paths or don’t set val_idxs in from_csv.

2 Likes

in that case would you explicitly set it to val_name=""since it has a default value?

Set it to None I think. Not sure I’ve actually tried it, so just yell if it doesn’t work.

Has anyone uploaded to kaggle from crestle ? I am not sure how to get hold of the .csv file generated to upload it manually on kaggle site.

You can use kaggle-cli to upload, or use FileLink like so: https://stackoverflow.com/questions/24437661/retrieving-files-from-remote-ipython-notebook-server

1 Like

Thanks Jeremy. Will do it using kaggle-cli. I see a slew of submissions from fastai students already. Amazing stuff. :slight_smile:

I somehow have four more images than the competition has so I have some work to figure that out. I think I should have a first submission tomorrow night as long as I am able to figure out why that is happening.

I figured out my issue and now have a completed submission.

I’m really happy with that as a starting point. I used resnet34 and haven’t really done anything to help the model yet. Thank you to everybody who had helped me get to this point. I can now confirm that it is possible to use from_paths even though from what I’ve heard from everybody, from_csv is the better way to do it.

6 Likes

Models log loss on a whole training set (acquired through 5 fold cv, between folds score might very from less than 0.17 to 0.26):

inception_4_300 	 0.228
inception_4_350 	 0.211
inception_4_400 	 0.204
inception_4_450 	 0.223
inceptionresnet_2_300 	 0.239
inceptionresnet_2_350 	 0.217
inceptionresnet_2_400 	 0.215
inceptionresnet_2_450 	 0.222

Simple averaging all models except forinceptionresnet_2_300, inception_4_450 gives 0.181 on a training set and 0.172 on a leaderboard.

3 Likes