Kaggle Questions


#41

I used SciKit Learn’s StratifiedKFold and generating symlinks in python to dynamically generate the training and validation sets (and reset them, if necessary). I hope that’s ok.


(melissa.fabros) #42

thanks for the tip! also, are the file names the id names?


(Jeremy Howard) #43

@melissa.fabros yes that’s exactly right!


(bckenstler) #44

Just a heads up, batches.filenames will always list the directory order of the files, regardless of whether the generator is set to shuffle or not. So if you don’t shuffle, obtaining the image id’s is straightforward, but has anyone figured out how obtain the image id’s in the order that a shuffling iterator outputs them in? I don’t really see how to do it


(Jeremy Howard) #45

@bckenstler I think the correct answer is: don’t do that! :wink:


(sethiavivek2006) #46

@all I was trying with vgg.test for the dogs-vs-cats-redux-kernels-edition on my AWS p2 large instance. I have executed all the steps mentioned in the lesson2 lecture for creating separate directories. But when I perform the step as shown in the picture, my execution gets stuck( indicated by * preceding the statement) until I interrupt. Further steps do not work since the previous step was not executed properly. Any suggestion on how to solve this ?


(Jeremy Howard) #47

It’s likely to take up to 10 mins, since it has to run the forward pass of the neural net on all 12,500 images. If it’s taking longer, make sure that it’s using your GPU (check the result of the ‘import utils’ line)


(sethiavivek2006) #48

@jeremy Ya the previous time, I didnt let it run for more than 10 mins. Tried it later and worked nicely. Thank you.


(mattobrien415) #50

I’m plowing thru the Redux notebook, and getting stuck here. This is after successfully loading weights I built last night:

Not sure why – the path is correct, the directory holds the expected 12,500 jpgs.

It seems like the get_batches isn’t finding anything, thus the divide by zero?

I’m wracking my brain but can’t think of any reasons why this is happening. Any advice would be greatly appreciated.

EDIT: strangely enough, after removing the +'test' from the path argument, all seems fine. I don’t think there’s anything wrong with my directory structure…but I suppose alls well that ends well…


Lesson 1 discussion
(janardhanp22) #51

Hi @mattobrien415

Have you created a directory named “unknown” under test directory and move all your test images under the directory called unknown ?
This might be resolution to ZeroDivisionError


(mattobrien415) #52

Thanks for the response, @janardhanp22

No, I didn’t do that…I supposed I didn’t realize there was a point of putting it into an ‘unknown’ subdir. Why not just leave it in the ‘test’ dir?

It looks like the unknown directory just contains all the test data anyway?


(Jeremy Howard) #53

The issue @janardhanp22 is referring to is that keras’ generator needs to know what labels to use for the images. It uses the sub-directory structure for that. If you don’t have labels, you still need the folder structure.


(janardhanp22) #54

Yes exactly what Jeremy explained. Keras is expecting a sub folder structure for labels.


(Abhik Mitra) #55

So I am doing this on the Kaggle (dogs-vs-cats-redux-kernels-edition ) data set after I have segregated into proper files. I am running this on the samples
vgg = Vgg16()
sampleBatches = vgg.get_batches(path+‘train’, batch_size=4)
imgs,labels = next(sampleBatches)
plots(imgs, titles=labels)
vgg.predict(imgs, True)

result
(array([ 0.4618, 0.9563, 0.4123, 0.4945], dtype=float32),
array([237, 162, 258, 281], dtype=int64),
[u’miniature_pinscher’, u’beagle’, u’Samoyed’, u’tabby’])

Which is correct

mainBatches = vgg.get_batches(path+‘train’, batch_size=batch_size)
val_batches = vgg.get_batches(path+‘valid’, batch_size = batch_size)
vgg.finetune(mainBatches)
vgg.fit(mainBatches, val_batches, nb_epoch=1)

Result
Epoch 1/1
1125/1125 [==============================] - 622s - loss: 0.5411 - acc: 0.8640 - val_loss: 0.1199 - val_acc: 0.9760

Now if I do “vgg.predict(imgs, True)” again, I would expect the confidence to be higher and the result should be same right ?

Instead I get this output .

(array([ 1. , 1. , 0.9995, 1. ], dtype=float32),
array([1, 1, 0, 0], dtype=int64),
[u’goldfish’, u’goldfish’, u’tench’, u’tench’])

What Am I doing wrong ?


(Jeremy Howard) #56

It’s a little hard to see with your formatting there. Could you put all the relevant code into a gist (https://help.github.com/articles/about-gists/) so that we can see each step?


(Jonathan) #58

Question, do you really pass your password in plain text as a command line argument?

My password has symbols that make the parsing of the command line command go haywire… Any insight?


(Rachel Thomas) #59

@jdgough Put your password in single quotes


#60

Yes. This is a required step. Thanks for pointing it out!


(Samuel Ekpe) #61

just upgrade setup tools using pip that worked for me


(Anthony) #62

Thank you very much this fixed my issue!

It wasn’t clear we should be using single quotes,
as formatting the string with double quotes is accepted,
and then a common sense check of running kg config returns
the password as '****' with single quotes making it look like it was successful.

May I suggest this is added to documentation somewhere?