Kaggle Questions

bckenstler · October 31, 2016, 5:32pm

Just a heads up, batches.filenames will always list the directory order of the files, regardless of whether the generator is set to shuffle or not. So if you don’t shuffle, obtaining the image id’s is straightforward, but has anyone figured out how obtain the image id’s in the order that a shuffling iterator outputs them in? I don’t really see how to do it

jeremy · October 31, 2016, 5:59pm

@bckenstler I think the correct answer is: don’t do that!

sethiavivek2006 · November 5, 2016, 6:13pm

@all I was trying with vgg.test for the dogs-vs-cats-redux-kernels-edition on my AWS p2 large instance. I have executed all the steps mentioned in the lesson2 lecture for creating separate directories. But when I perform the step as shown in the picture, my execution gets stuck( indicated by * preceding the statement) until I interrupt. Further steps do not work since the previous step was not executed properly. Any suggestion on how to solve this ?

jeremy · November 5, 2016, 6:46pm

It’s likely to take up to 10 mins, since it has to run the forward pass of the neural net on all 12,500 images. If it’s taking longer, make sure that it’s using your GPU (check the result of the ‘import utils’ line)

sethiavivek2006 · November 6, 2016, 11:53am

@jeremy Ya the previous time, I didnt let it run for more than 10 mins. Tried it later and worked nicely. Thank you.

mattobrien415 · December 11, 2016, 12:35am

I’m plowing thru the Redux notebook, and getting stuck here. This is after successfully loading weights I built last night:

Not sure why – the path is correct, the directory holds the expected 12,500 jpgs.

It seems like the get_batches isn’t finding anything, thus the divide by zero?

I’m wracking my brain but can’t think of any reasons why this is happening. Any advice would be greatly appreciated.

EDIT: strangely enough, after removing the +'test' from the path argument, all seems fine. I don’t think there’s anything wrong with my directory structure…but I suppose alls well that ends well…

janardhanp22 · December 11, 2016, 2:38am

Hi @mattobrien415

Have you created a directory named “unknown” under test directory and move all your test images under the directory called unknown ?
This might be resolution to ZeroDivisionError

mattobrien415 · December 11, 2016, 2:42am

Thanks for the response, @janardhanp22

No, I didn’t do that…I supposed I didn’t realize there was a point of putting it into an ‘unknown’ subdir. Why not just leave it in the ‘test’ dir?

It looks like the unknown directory just contains all the test data anyway?

jeremy · December 11, 2016, 10:13pm

The issue @janardhanp22 is referring to is that keras’ generator needs to know what labels to use for the images. It uses the sub-directory structure for that. If you don’t have labels, you still need the folder structure.

janardhanp22 · December 12, 2016, 6:16pm

Yes exactly what Jeremy explained. Keras is expecting a sub folder structure for labels.

complancoder · December 23, 2016, 9:09pm

So I am doing this on the Kaggle (dogs-vs-cats-redux-kernels-edition ) data set after I have segregated into proper files. I am running this on the samples
vgg = Vgg16()
sampleBatches = vgg.get_batches(path+‘train’, batch_size=4)
imgs,labels = next(sampleBatches)
plots(imgs, titles=labels)
vgg.predict(imgs, True)

result
(array([ 0.4618, 0.9563, 0.4123, 0.4945], dtype=float32),
array([237, 162, 258, 281], dtype=int64),
[u’miniature_pinscher’, u’beagle’, u’Samoyed’, u’tabby’])

Which is correct

mainBatches = vgg.get_batches(path+‘train’, batch_size=batch_size)
val_batches = vgg.get_batches(path+‘valid’, batch_size = batch_size)
vgg.finetune(mainBatches)
vgg.fit(mainBatches, val_batches, nb_epoch=1)

Result
Epoch 1/1
1125/1125 [==============================] - 622s - loss: 0.5411 - acc: 0.8640 - val_loss: 0.1199 - val_acc: 0.9760

Now if I do “vgg.predict(imgs, True)” again, I would expect the confidence to be higher and the result should be same right ?

Instead I get this output .

(array([ 1. , 1. , 0.9995, 1. ], dtype=float32),
array([1, 1, 0, 0], dtype=int64),
[u’goldfish’, u’goldfish’, u’tench’, u’tench’])

What Am I doing wrong ?

jeremy · December 23, 2016, 10:11pm

It’s a little hard to see with your formatting there. Could you put all the relevant code into a gist (https://help.github.com/articles/about-gists/) so that we can see each step?

jdgough · January 2, 2017, 3:24am

Question, do you really pass your password in plain text as a command line argument?

My password has symbols that make the parsing of the command line command go haywire… Any insight?

rachel · January 4, 2017, 1:49am

@jdgough Put your password in single quotes

jingairpi · January 19, 2017, 11:40pm

Yes. This is a required step. Thanks for pointing it out!

geniusgeek · March 3, 2017, 5:36am

just upgrade setup tools using pip that worked for me

anthonys · April 4, 2017, 8:16pm

Thank you very much this fixed my issue!

It wasn’t clear we should be using single quotes,
as formatting the string with double quotes is accepted,
and then a common sense check of running kg config returns
the password as '****' with single quotes making it look like it was successful.

May I suggest this is added to documentation somewhere?

darthdeus · April 30, 2017, 2:12am

This is probably a silly question, but what does the Kaggle submission score mean? Is it accuracy percentage on the test set? Or is it a percentile?

I’ve tried two submissions so far, one tested on a very small training set (100 images) just to see if it works, and another tested on 2000 images, and they got 31.18 and 32.86 score respectively (but both scored around 99% accuracy on the validation set).

simoneva · April 30, 2017, 9:08am

Each kaggle competition has a page explaining how it is scored. A common one is logloss.

Floriano · July 14, 2017, 2:45pm

Good morning,
I am complete new to this lesson, did my setup of a p2 instance and made my way through lesson 1. It was great fun and i took a lot out of it so far. Thanks for the great course. As proposed I made an account at kaggle, choosed d&c redux, confirmed the rules and uploaded a few different submission files. (Just minor difference) The best score I got was 0.09185, somewhere in the 300dreds up
Looking under my submissions I find alll of the uploads, however I cannot find myself in the leaderboard. Am I doing something wrong here? A search inside kaggle did not enlight me so far.
Looking at the score my best submission was rated, it is far far away from the top 50. Up to now I run 4 epochs, which indeed gave some improvements, but nothing which would encourage me that after a few more epochs I would arrive within the top 50.
I use the correct label (the ones for dogs) I played with the logloss values, which shows some results but again nothing which let me think that it could change the game.
So I am a bit stuck. Either I got something very wrong, or the last 6 months changed the quality level of this competition in a way, that further, but at the moment unknown steps are necessary to make significant improvements
Thanks in advance for every hint