Dog Breed Identification challenge

Chris_Palmer · November 19, 2017, 5:21am

OK, thanks Kevin

As you can tell from my other post, I am trying to find ways to (lawfully) increase my training data. The idea of using all of the data was recommended by @jeremy but I still have not found a way to work out how to do that!

Another thing that puzzles me (if I understand it correctly) is that our augmented data is only used in testing via the .TTA method. I thought that we should have the augmented data available to increase the size of our training set!

shubham24 · November 19, 2017, 5:21am

Hi fellows,
I am getting the following error when trying to generate the predictions for test data.

http://forums.fast.ai/uploads/default/optimized/2X/b/b621850877e85174871629daa4e9b04807bb2c1a_1_690x335.png

My train, valid and test folders look right. I generated them using the following scripts.

Looking back, I should have just used the ImageClassifierData.from_csv method.

ecdrid · November 19, 2017, 5:28am

It means data is None.
Screen shot of your directory might help…

KevinB · November 19, 2017, 5:30am

Just to clarify, you are just trying to use all of your data as train and none as validation? if so, just set val_idxs=[0] This isn’t exactly what you want, but it is the easiest way to do things currently. It puts one image into the validation set and the rest into training. You won’t get very much feedback at this point though so it’s important that you already have your model all set up how you want it before making this change.

KevinB · November 19, 2017, 5:33am

Did you set your test directory in your ImageClassifierData? do an !ls {PATH}test/ to make sure there is date in that location that you set

shubham24 · November 19, 2017, 5:40am

Hi @KevinB and @ecdrid ,
Thanks for the response.

I think I do have test directory. These screenshots should clarify the directory structure.

To create the valid set, I randomly picked 1/5th of the images of each breed and put them into valid/breed_name folder. Test set is as it was when unzipped from files downloaded from Kaggle.

KevinB · November 19, 2017, 5:42am

It will use the augmented data in training only if precompute is set to false when generating the model. If precompute is set to True then things get a little murkier for me. I think at that point, the model is loaded with the activations and won’t change. I would love to hear somebody else answer that better than me though because I am still trying to work through precompute = True vs precompute = False and freeze() and unfreeze()

Augmentation is definitely a way to increase the size of your training set. It takes the picture and looks at it with different rotation, zooming, coloring, etc which can help make a model that is more generic and can help predict the test data better. Maybe there is an image of the same dog breed, but the camera is slightly closer or farther away. This lets you simulate that image possibility.

KevinB · November 19, 2017, 5:45am

Can you show where you set data = ImageClassifierData(ALL THE INFORMATIONs)

shubham24 · November 19, 2017, 5:53am

Yes, it looks same as in Lesson 1 and 2 notebooks.

I’ll go through each cell again.

Moody · November 19, 2017, 5:53am

Do you use lr finder at different stages (ie before and after re-sizing)?

KevinB · November 19, 2017, 5:57am

Yep, looks like that is your problem. You are telling ~~TTA~~ predict that you want to use test, but you haven’t said where that is exactly. If you do Shift + Tab when you are in the from_paths command, you will see the arguments it can take and what the defaults are. Train and Valid both have a default value, but test doesn’t so you have to do something like this (your other stuff looks fine, this is just from mine):

ImageClassifierData.from_paths(path, tfms=tfms, test_name="test")

shubham24 · November 19, 2017, 6:00am

Thanks @KevinB. I should have read the source code for identifying/understanding each of the arguments. This was very helpful!

vnator · November 19, 2017, 6:07am

Alright, I can’t figure out at all how to create a csv file to submit to kaggle. I can’t find anything in the lecture 2 video either, so could someone help me with getting the code for that? Thanks!

Chris_Palmer · November 19, 2017, 6:07am

Hi Kevin

This does not work, without getting the erorr I demonstrated above when running learn = ConvLearner.pretrained(arch, data, precompute=True, ps=0.5)

KevinB · November 19, 2017, 6:09am

did you try val_idxs = [0]? That would be different than

Chris_Palmer · November 19, 2017, 6:10am

I think you have answered my question re augmentation - it DOES use augmented images when you set precompute=False. Is this necessary because if using precompute=True then the model can ony deal with images it has previously seen? If this is the case, then what purpose would there be in doing any warm-up training when precompute=True?

binga · November 19, 2017, 6:11am

At what point are you stuck?

Chris_Palmer · November 19, 2017, 6:11am

Sorry, just saw and unserstood this this, will try…

jamesrequa · November 19, 2017, 6:15am

Which competition are you referring to? I assume since you are posting to this thread then its for Dog Breed? The code can vary quite a bit depending on which one it is
Generally speaking I always start by using the submission_file.csv that kaggle provides for each competition as my df with a simple line like submission = pd.read_csv('sample_submission.csv'). From there you just replace the ids in the sub with your test ids (so they are sorted correctly/aligned with predictions) which you can grab from here data.test_dl.dataset.fnames. Then you fill in the rest of the columns in the df with your predictions.

KevinB · November 19, 2017, 6:16am

Check out this link. This really helped me get a submission generated. Don’t worry so much about the fact that it is for vgg19. Focus on the exporting to CSV steps near the bottom.