Kaggle Questions

(Jeremy Howard (Admin)) #34

As it mentions in the error ‘make sure the development packages of libxml2 and libxslt are installed’. They should already be part of anaconda - are you using our script to setup your AWS instance? It looks like you are either not using it, or are not connected to your instance, since your prompt above shows ‘kicho@PC08EYW5 ~’, but on our AWS setup your username would be ‘ubuntu’, not ‘kicho’.

On our AWS setup, you should not need to install any prerequisites in order to install kaggle-cli - just ‘pip install kaggle-cli’ should be it.


(garima.agarwal) #35

I was getting an error
ubuntu@ip-10-0-0-5:~/nbs$ kg download
Starting new HTTPS connection (1): www.kaggle.com
’NoneType’ object has no attribute ‘find_all’

There was a typo in the competition name. I was using dogs-vs-cats-redux-kernel-edition instead of dogs-vs-cats-redux-kernels-edition. That fixed it.

Thanks @jeremy for pointing out!

1 Like

(Kicho) #36

Thank you, Jeremy! It works now. I accessed an AWS instance using our script and then pip install kaggle-cli worked. :slight_smile:

1 Like

(Swathi Shyam Sunder) #37


@rachel - Thank you for pointing me to the evaluation page, where there is info about the submission s.
My question was about how to actually run the trained model on the test data? Was I right in trying to execute vgg.predict on the test data?
When I actually did this, the result was a tuple, which I converted to a numpy array through numpy.asarray and then wrote to a csv file.
I think there is something wrong in this.


(wendydherin) #38

Remember that the get_batches method shuffle has a default setting of True.


(Jeremy Howard (Admin)) #39

@swathi.ssunder you’re on the right track, but you’ll need to spend some time looking at the data structure that’s returned from predict(), and figuring out how to modify it to be the correct structure to submit to kaggle. You’ll need to write some code to make these changes.

Give it your best shot, and if you get stuck, let us know exactly what you’ve tried, what you’ve found so far, and what it is that you need to do next and don’t know how to do.

1 Like

(Swathi Shyam Sunder) #40


@jeremy - thanks for clarifying. sure, I shall do that.



I used SciKit Learn’s StratifiedKFold and generating symlinks in python to dynamically generate the training and validation sets (and reset them, if necessary). I hope that’s ok.


(melissa.fabros) #42

thanks for the tip! also, are the file names the id names?


(Jeremy Howard (Admin)) #43

@melissa.fabros yes that’s exactly right!


(bckenstler) #44

Just a heads up, batches.filenames will always list the directory order of the files, regardless of whether the generator is set to shuffle or not. So if you don’t shuffle, obtaining the image id’s is straightforward, but has anyone figured out how obtain the image id’s in the order that a shuffling iterator outputs them in? I don’t really see how to do it


(Jeremy Howard (Admin)) #45

@bckenstler I think the correct answer is: don’t do that! :wink:


(sethiavivek2006) #46

@all I was trying with vgg.test for the dogs-vs-cats-redux-kernels-edition on my AWS p2 large instance. I have executed all the steps mentioned in the lesson2 lecture for creating separate directories. But when I perform the step as shown in the picture, my execution gets stuck( indicated by * preceding the statement) until I interrupt. Further steps do not work since the previous step was not executed properly. Any suggestion on how to solve this ?

1 Like

(Jeremy Howard (Admin)) #47

It’s likely to take up to 10 mins, since it has to run the forward pass of the neural net on all 12,500 images. If it’s taking longer, make sure that it’s using your GPU (check the result of the ‘import utils’ line)


(sethiavivek2006) #48

@jeremy Ya the previous time, I didnt let it run for more than 10 mins. Tried it later and worked nicely. Thank you.

1 Like

(mattobrien415) #50

I’m plowing thru the Redux notebook, and getting stuck here. This is after successfully loading weights I built last night:

Not sure why – the path is correct, the directory holds the expected 12,500 jpgs.

It seems like the get_batches isn’t finding anything, thus the divide by zero?

I’m wracking my brain but can’t think of any reasons why this is happening. Any advice would be greatly appreciated.

EDIT: strangely enough, after removing the +'test' from the path argument, all seems fine. I don’t think there’s anything wrong with my directory structure…but I suppose alls well that ends well…


Lesson 1 discussion
(janardhanp22) #51

Hi @mattobrien415

Have you created a directory named “unknown” under test directory and move all your test images under the directory called unknown ?
This might be resolution to ZeroDivisionError

1 Like

(mattobrien415) #52

Thanks for the response, @janardhanp22

No, I didn’t do that…I supposed I didn’t realize there was a point of putting it into an ‘unknown’ subdir. Why not just leave it in the ‘test’ dir?

It looks like the unknown directory just contains all the test data anyway?


(Jeremy Howard (Admin)) #53

The issue @janardhanp22 is referring to is that keras’ generator needs to know what labels to use for the images. It uses the sub-directory structure for that. If you don’t have labels, you still need the folder structure.


(janardhanp22) #54

Yes exactly what Jeremy explained. Keras is expecting a sub folder structure for labels.