Dog Breed Identification challenge

rikiya · November 8, 2017, 10:26pm

@jamesrequa That worked perfectly, thanks a lot

jeremy · November 9, 2017, 12:44am

TTA returns a tuple. The 2nd item are the y values.

jamesrequa · November 9, 2017, 1:41am

@jeremy this makes sense for predicting on the validation set but for the test set we don’t have any y values so in that case shouldn’t learn.TTA(is_test=True) just return the predictions?

If you agree I could submit a PR for it.

jeremy · November 9, 2017, 2:12am

IIRC it just returns zeros for the y values for the test set - I figured it’s more convenient for it to be consistent, so you don’t have to write different code for different datasets.

arjunrajkumar · November 9, 2017, 3:11am

How do I check the most correct predictions for each breed while analysing the results?

In the dogs vs cats it was a classification problem - and I could call this line as I knew 1 was a dog.

plot_val_with_title(most_by_correct(1, True), "Most correct dogs")

But wondering how to check when having multiple outputs - and also how to display the title for each.

Thanks.

rikiya · November 9, 2017, 3:23am

@jeremy Thanks, got it! Actually I saw bunch of zeros as 2nd item, now totally make sense

jeremy · November 9, 2017, 3:28am

You’ll need to write that code yourself - it would be a great exercise, actually. Let us know if you try and get stuck!

satish860 · November 9, 2017, 5:31am

Maybe a silly question. Why do we need to take np.exp of the prediction?

jamesrequa · November 9, 2017, 6:03am

That gives us the probabilities.

arjunrajkumar · November 10, 2017, 3:06am

In the below code, why are we choosing 300 as the default size value to check if condition?

def get_data(sz, bs):
    tfms = tfms_from_model(arch, sz, aug_tfms=transforms_side_on, max_zoom = 1.1)
    data = ImageClassifierData.from_csv(PATH, 'train', f'{PATH}labels.csv', 
                                    test_name ='test', val_idxs=val_idxs, suffix='.jpg', tfms=tfms, bs=bs)
    return data if sz>300 else data.resize(340, 'tmp')

jeremy · November 10, 2017, 3:43am

Most pytorch models return the log of the probabilities.

jeremy · November 10, 2017, 3:44am

Great question. Since we have max_zoom=1.1, I figured we should ensure our images are at release sz*1.1. And I figured resizing them to 340x340 would save plenty of time, and leave plenty of room to experiment.

KevinB · November 10, 2017, 4:02am

I created this Kernel so I could share some code here:

https://www.kaggle.com/kevinbird15/breaking-data-into-train-and-valid

This really helped me quickly break my train folder into train and valid and also broke everything into folders.

It didn’t really work great as a Kernel, but I didn’t know where else to share it. Hopefully it isn’t too annoying in their list of Kernels.

jeremy · November 10, 2017, 4:57am

If you use from_csv like we did in the last lesson you can skip all that

sermakarevich · November 10, 2017, 5:00am

Hi @jeremy, just qq - what method allows not to use validation set at all?

jeremy · November 10, 2017, 6:20am

Any method should allow that. Just don’t set val_name in from_paths or don’t set val_idxs in from_csv.

KevinB · November 10, 2017, 6:25am

in that case would you explicitly set it to val_name=""since it has a default value?

jeremy · November 10, 2017, 6:31am

Set it to None I think. Not sure I’ve actually tried it, so just yell if it doesn’t work.

ravivijay · November 10, 2017, 6:58am

Has anyone uploaded to kaggle from crestle ? I am not sure how to get hold of the .csv file generated to upload it manually on kaggle site.

jeremy · November 10, 2017, 7:02am

You can use kaggle-cli to upload, or use FileLink like so: https://stackoverflow.com/questions/24437661/retrieving-files-from-remote-ipython-notebook-server