Kaggle Questions

jeff · October 31, 2016, 12:35am

I used SciKit Learn’s StratifiedKFold and generating symlinks in python to dynamically generate the training and validation sets (and reset them, if necessary). I hope that’s ok.

melissa.fabros · October 31, 2016, 6:08am

thanks for the tip! also, are the file names the id names?

jeremy · October 31, 2016, 4:26pm

@melissa.fabros yes that’s exactly right!

bckenstler · October 31, 2016, 5:32pm

Just a heads up, batches.filenames will always list the directory order of the files, regardless of whether the generator is set to shuffle or not. So if you don’t shuffle, obtaining the image id’s is straightforward, but has anyone figured out how obtain the image id’s in the order that a shuffling iterator outputs them in? I don’t really see how to do it

jeremy · October 31, 2016, 5:59pm

@bckenstler I think the correct answer is: don’t do that!

sethiavivek2006 · November 5, 2016, 6:13pm

@all I was trying with vgg.test for the dogs-vs-cats-redux-kernels-edition on my AWS p2 large instance. I have executed all the steps mentioned in the lesson2 lecture for creating separate directories. But when I perform the step as shown in the picture, my execution gets stuck( indicated by * preceding the statement) until I interrupt. Further steps do not work since the previous step was not executed properly. Any suggestion on how to solve this ?

jeremy · November 5, 2016, 6:46pm

It’s likely to take up to 10 mins, since it has to run the forward pass of the neural net on all 12,500 images. If it’s taking longer, make sure that it’s using your GPU (check the result of the ‘import utils’ line)

sethiavivek2006 · November 6, 2016, 11:53am

@jeremy Ya the previous time, I didnt let it run for more than 10 mins. Tried it later and worked nicely. Thank you.

mattobrien415 · December 11, 2016, 12:35am

I’m plowing thru the Redux notebook, and getting stuck here. This is after successfully loading weights I built last night:

Not sure why – the path is correct, the directory holds the expected 12,500 jpgs.

It seems like the get_batches isn’t finding anything, thus the divide by zero?

I’m wracking my brain but can’t think of any reasons why this is happening. Any advice would be greatly appreciated.

EDIT: strangely enough, after removing the +'test' from the path argument, all seems fine. I don’t think there’s anything wrong with my directory structure…but I suppose alls well that ends well…

janardhanp22 · December 11, 2016, 2:38am

Hi @mattobrien415

Have you created a directory named “unknown” under test directory and move all your test images under the directory called unknown ?
This might be resolution to ZeroDivisionError

mattobrien415 · December 11, 2016, 2:42am

Thanks for the response, @janardhanp22

No, I didn’t do that…I supposed I didn’t realize there was a point of putting it into an ‘unknown’ subdir. Why not just leave it in the ‘test’ dir?

It looks like the unknown directory just contains all the test data anyway?

jeremy · December 11, 2016, 10:13pm

The issue @janardhanp22 is referring to is that keras’ generator needs to know what labels to use for the images. It uses the sub-directory structure for that. If you don’t have labels, you still need the folder structure.

janardhanp22 · December 12, 2016, 6:16pm

Yes exactly what Jeremy explained. Keras is expecting a sub folder structure for labels.

complancoder · December 23, 2016, 9:09pm

So I am doing this on the Kaggle (dogs-vs-cats-redux-kernels-edition ) data set after I have segregated into proper files. I am running this on the samples
vgg = Vgg16()
sampleBatches = vgg.get_batches(path+‘train’, batch_size=4)
imgs,labels = next(sampleBatches)
plots(imgs, titles=labels)
vgg.predict(imgs, True)

result
(array([ 0.4618, 0.9563, 0.4123, 0.4945], dtype=float32),
array([237, 162, 258, 281], dtype=int64),
[u’miniature_pinscher’, u’beagle’, u’Samoyed’, u’tabby’])

Which is correct

mainBatches = vgg.get_batches(path+‘train’, batch_size=batch_size)
val_batches = vgg.get_batches(path+‘valid’, batch_size = batch_size)
vgg.finetune(mainBatches)
vgg.fit(mainBatches, val_batches, nb_epoch=1)

Result
Epoch 1/1
1125/1125 [==============================] - 622s - loss: 0.5411 - acc: 0.8640 - val_loss: 0.1199 - val_acc: 0.9760

Now if I do “vgg.predict(imgs, True)” again, I would expect the confidence to be higher and the result should be same right ?

Instead I get this output .

(array([ 1. , 1. , 0.9995, 1. ], dtype=float32),
array([1, 1, 0, 0], dtype=int64),
[u’goldfish’, u’goldfish’, u’tench’, u’tench’])

What Am I doing wrong ?

jeremy · December 23, 2016, 10:11pm

It’s a little hard to see with your formatting there. Could you put all the relevant code into a gist (https://help.github.com/articles/about-gists/) so that we can see each step?

jdgough · January 2, 2017, 3:24am

Question, do you really pass your password in plain text as a command line argument?

My password has symbols that make the parsing of the command line command go haywire… Any insight?

rachel · January 4, 2017, 1:49am

@jdgough Put your password in single quotes

jingairpi · January 19, 2017, 11:40pm

Yes. This is a required step. Thanks for pointing it out!

geniusgeek · March 3, 2017, 5:36am

just upgrade setup tools using pip that worked for me

anthonys · April 4, 2017, 8:16pm

Thank you very much this fixed my issue!

It wasn’t clear we should be using single quotes,
as formatting the string with double quotes is accepted,
and then a common sense check of running kg config returns
the password as '****' with single quotes making it look like it was successful.

May I suggest this is added to documentation somewhere?