Lesson 1 discussion

martin · December 30, 2016, 11:00pm

Understanding of the “preds” values.

batches, preds = vgg.test(path+‘test’, batch_size = batch_size*2)
preds[:5]:
[[ 2.8490e-05 9.9997e-01]
[ 1.2563e-06 1.0000e+00]
[ 4.8328e-06 1.0000e+00]
[ 1.0000e+00 4.2082e-12]
[ 1.0000e+00 1.1853e-16]]

isdog = preds[:,1]

I have two questions on this code:

The ‘preds’ array stores each of the ‘cat’'s probability into the first column of the array, and ‘dog’‘s probability into the 2nd column of the array, and that’s why ‘isdog=preds[:,1]’ extract all dogs’ probabilities. In this task, all probabilties of cats can be ignored.
Looking at the 5 examples, why each of the two probabilities in each line doesn’t add up to ‘1’? Since it’s a two-class classification, the sum of each probability pair should be ‘1’, right?

jeremy · December 30, 2016, 11:04pm

Right. Looks like roundoff error.

martin · December 31, 2016, 1:17am

I almost repeated the redux notebook, and submitted to Kaggle, but only ranked at 99, with a loss of 99
0.10400. Seems hard to get the first half.

melissa.fabros · December 31, 2016, 1:56am

Hi Daniel,

great questions!

1)get_batch() is a generator that will feed the entire dataset to your predict function 64 images at a time. Even though you call it once, it will process the entire dataset in the filepath you pass in.

This is lesson one. So the idea is to have fun playing with the tools that have been given. you don’t need to write a new predict function. It’s part of the Keras library and has been packaged nicely for us to use easily as a class method in the vgg.py file. The overarching goal for lesson 1 is learn how to setup your server, manage the input files, the functions, train the model, create an output file.

You’ll only really know your how well your model does until you upload your output to the kaggle competition grader.

try researching the python numpy library. it has lots of handy functions, including one called savetxt() .

Hope that helps some & happy new year!
m

Daniel · December 31, 2016, 2:49am

Thanks Melissa, Happy New Year to you too!

maral · December 31, 2016, 3:34pm

A tip for anyone trying to perform lesson 1 on a MacBook Pro. To get the GPU to work unplug any additional monitors as this will free up valuable GPU memory (VRAM). Then reduce the batch size value to 2 and run through the example. You can gradually increase the batch size until GPU memory is exhausted - which is indicated by a MemoryError stack trace. In my case I am using batch size of 8 with a GT 750M GPU which has 2 GB VRAM.

Also, if you have a MacBook with 2 video cards and integrated switching ensure you follow these instructions to disable automatic video card switching https://support.apple.com/en-au/HT202043

gmedasani · December 31, 2016, 9:17pm

Thanks for this comment. This saved me some time

RussBarber · December 31, 2016, 10:28pm

Just a note that I was able to get lesson 1 to run locally on a Windows machine. Windows 10, Python 3, Nvidia GTX 960.
There are a few resources that show how to do this. Theano setup on Windows
I did get an “out of memory” error and had to change batch_size=10 to fix that.
Also I use a tool called GPU-Z to monitor/verify that the GPU is getting used.
If it would help, maybe throw together a Windows “How to”.

jeremy · December 31, 2016, 10:30pm

Please do! I can create a wiki user id for you if you like, so you can put it right on the wiki

Daniel · January 1, 2017, 1:31am

please do! Thanks

Daniel · January 1, 2017, 1:53am

Hi Peter,

Thanks for this post.

I am using a Mac pro, and I am able to run Redux.ipynb with sample size locally. But it looks like taking ages to finish when I tried to predict with model built on sample data on full test dataset.

When you say “perform lesson 1 on a MacBook Pro” do you mean you can train and predict on full train and test datasets locally? If so, how long does it take to train and predict on your mac for lesson 1?

Here is my mac pro detail, I don’t see GPU info. Does it mean my mac has no gpu and can not do the same as you did?

by the way, in order to run import libraries without error, I have to comment out some modules:

# from utils.py
from keras.layers import Input, Embedding, Reshape, merge, LSTM #, Bidirectional
# from vgg16.py and vgg16bn.py
#from keras.layers.pooling import GlobalAveragePooling2D

As I don’t see those modules are used anywhere in redux.ipynb, not in utils.py or vgg16.py, I guess they will not be used at all in this deep learning course right? or The two modules are actually useful but won’t easily install locally? @jeremy

Thanks
Daniel

maral · January 1, 2017, 7:46am

Hi @Daniel

Yes for lesson1 I can train against the full cats and dogs dataset. Here is the training output with a batch size of 8 running off MacBook GPU.

Found 23000 images belonging to 2 classes.
Found 2000 images belonging to 2 classes.
Epoch 1/1
23000/23000 [==============================] - 4604s - loss: 0.1857 - acc: 0.9695 - val_loss: 0.1320 - val_acc: 0.9810

It looks like your MacBook does not have an Nvidia GPU. Here is the output of my MacBook system info.

I think if you try to train on MacBook CPU it will take a very long time. I first trained the full dataset on a 10-core CPU system (with Theano multi-core CPU enabled) and it took twice as long as the training on the MacBook GPU.

You may need to update your python modules so that you don’t need to comment out those lines.

I installed Anaconda (from here https://www.continuum.io/downloads) and then updated keras module using the command:

pip install --upgrade keras

Hope this helps.

Daniel · January 1, 2017, 8:07am

Hi Peter,

Thank you very much for your help!

I have access to a PC (windows) with Nvdia GEFORCE GTX 960M and Intel Core i7-6700HQ CPU, RAM 16GB. If I get everything setup in this PC and run lesson 1, can you approximate the time I need to train and test on lesson 1? Do you think it can finish running within 2 to 3 hours?

and yes, my keras is outdated, and I updated it, and works without commenting out the two lines of codes.

Also, given my current Mac, I can train and predict on samples (200 training set, 20 test set) in a very short time, but it seems endless to predict on the full test set.

my question: (could @jeremy have a look too? )
(I assume prediction work is not as complex as finetuning and fitting work, I could be totally wrong) Is there a way to speed up the full test set prediction work on my Mac, say finishing within 2 hours? or do I have to have a GPU achieve such speed?

Thanks

Daniel

mclasson · January 1, 2017, 10:44am

It is in the ~./.theanorc file

[global]
device = gpu
floatX = float32

mclasson · January 1, 2017, 10:55am

Got everything to work now. However, during a run the download of weights was aborted and now h5py errors out on the cached (truncated) file. How can I reset the cache or force a fresh download?

/home/mclasson/anaconda2/lib/python2.7/site-packages/keras/engine/topology.pyc in load_weights(self, filepath, by_name)
2693 ‘’'
2694 import h5py
-> 2695 f = h5py.File(filepath, mode=‘r’)
2696 if ‘layer_names’ not in f.attrs and ‘model_weights’ in f:
2697 f = f[‘model_weights’]

IOError: Unable to open file (Truncated file: eof = 28778496, sblock->base_addr = 0, stored_eoa = 553620808)

maral · January 1, 2017, 2:40pm

Yes should finish within 2-3 hours on the basis that a 960M has more memory and higher clock rate than a 750M.

My prediction on test set ran for 36 minutes w/ GPU.

Best thing perhaps is confirm everything working with small sample then set it to run while you are sleeping. By the morning should be finished

maral · January 1, 2017, 2:44pm

I think you will need to delete the incomplete h5 file under your home directory. See below.

$ ls -la ~/.keras/models
total 1081096
drwxr-xr-x 4 m staff 136 30 Dec 02:05 .
drwxr-xr-x 4 m staff 136 30 Dec 00:54 …
-rw-r–r-- 1 m staff 35363 30 Dec 02:05 imagenet_class_index.json
-rw-r–r-- 1 m staff 553482496 30 Dec 02:05 vgg16.h5

mclasson · January 1, 2017, 2:46pm

I literally found it 1 second before you posted this. Thanks!

zaiddabaeen · January 1, 2017, 7:27pm

So I got a 0.70184 on the competition. I am unsure how can we make the score better? I am now trying with training the model with 2 epochs instead of one.

EDIT: Actually that did it, the score is now 0.67697 and I am in the All 0.5 Benchmark.

Questions:
1- Would running the predictions again on the same test set result in different results? (I’m guessing not).
2- Would training with the same number of epochs on the same training set result in different results?
3- Does anybody here know what is the preferred number of epochs for this model that gets the highest accuracy? Is it like 1-5? 5-10? 10-20?

Daniel · January 2, 2017, 12:49am

Thanks a lot Peter