Lesson 1 discussion

Tried re-running it-no luck. I am in Chrome.

Do the tutorial examples (in a notebook) work?: http://www.scipy-lectures.org/intro/matplotlib/matplotlib.html

I will check, thank you

One other thing to note - the %matplotlib inline command has to be run before matplotlib gets imported:

If you’re note sure, try restarting the kernel (under the “Kernel” menu) and then execute the %matplotlib inline cell before any other cells.



Hi Rachel,
I can run the lesson 1 notebook on the dogcats data with no problem.
However, I am not sure how to run it with Dogs vs Cats:Redux. data. I downloaded the train and test files from kaggle using kaggle-cli but there are not separated into dogs/cats images. Can you explain in more details how we are supposed to run the vgg on this new data and submit it?


@layla.tadjpour you’ll need to write a script that separates it into the correct folder names. If you get stuck, you’ll find one written for you by @vshets that you could look at for inspiration in this thread: Kaggle Questions


Anyone else having an issue where the after the model is finetuned to make predictions on the dogs vs cats redux data, the predictions are binary? When I generate predictions on dogs/cats data with the vgg model before fine tuning, I get fractional probabilities for each of the 1000 imagenet categories (eg. .23 likelihood of Egyptian cat). However, after finetuning, I never get any predictions with fractional probabilities between 0 and 1, they’re always exactly 0 or exactly 1. The reason I ask is because the scoring function on kaggle is more forgiving of incorrect predictions closer to .5, eg for a given example, .55 likelihood of dog, .45 of cat, rather than 1 of dog, 0 of cat.

1 Like

I’m having the same issue, but in addition it’s assigning classes of goldfish and tench to the images instead of dog and cat. These are the first two classes in the original vgg/imagenet model, so I guess Keras didn’t pull the names from the training directory structure?

I can totally replace the labels, but I get the idea I’ve missed something.

@tom / @jbrown81 please let us know exactly what code you’re using, and what results you get - some tips on asking for help are here: http://wiki.fast.ai/index.php/How_to_ask_for_Help . Once we have this info, I’m sure we’ll be able to resolve your problems.

Can I reflect back what I think the todo list is for lesson 1?

  1. Get AWS instance running (either g2 or m4 if not yet approved for g2) after contacting support etc.
  2. Setup ssh keys as per instructions in setup video
  3. install bash setup script onto server instance
  4. launch jupyter notebook on the instance
  5. once the notebook is running, review the lesson 1 notebook notes and run each cell of code to figure out what python and vgg is doing
  6. install kaggle CLI onto the server instance
  7. use the kaggle CLI to download the current data for the Dogs vs. Cats Redux competition
  8. configure the new data to the file structure in the same way that was used in the sample lesson 1 notebook
  9. make a copy of the lesson 1 notebook and use the new copy to draw in the new Dogs Vs. Cats data (including moving utils.py and vgg16.py to the new folder where the new notebook sits?)
  10. Run the relevant code cells on the sample set of new Dogs v. Cats data to make a prediction on the new image data set.
  11. Once, the sample set works, modify the jupyter notebook to use the on the new test data images
  12. write a script that takes the predict() data of the new Dogs vs. Cats data and writes the data to a new csv file in the format of the sample_submission.csv file that was downloaded with the Dogs vs. Cats
  13. submit that new submission.csv file to the kaggle via the CLI tool
  14. check the public scoreboard for your own ranking
  15. modify or tune current code in the lesson 1 notebook to try to get into the top 50% ranking of the current Dogs v Cats competition
  16. start exploring the other new datasets on kaggle and decide which one you or some teammates would like to study further during the course
  17. download the new data to your EC2 instance and repeat the previous steps with your brand new data.

Is this about right?

I understand that the first lesson is mostly about getting comfortable with a terminal to manipulate a server instance in the cloud, how to organize raw data, and how to participate in the kaggle community.

Are we also learning how to fine tune the model onto the data? the examples with the VGG() model seem already optimized with batch=64. To submit to the kaggle competition did you mean for us to optimize the “the Create a VGG model from scratch in Keras” with the new Dogs v Cats data in order to get above 50% in the kaggle ranking?


At this stage of the class, what knobs could we be using for fine-tuning VGG16 for dogs vs cats and scoring “well” (within top half of leaderboard) in the Kaggle competition?

  • running more epochs?
  • tweaking the optimizer parameters? (not really covered yet…)

** I wanted to add that I was able to submit and get on the board but a little bit shy of the top half :slight_smile:

I think discussion about how to tune the model is happening here:

1 Like

If anyone is looking for documentation on the methods that a VGG object takes:

Could be helpful in tuning the prediction or understanding the parameters and returns of methods like predict()


hi @leahob. same here, i was a bit shy of the second half after submitting (running more epochs seemed to improve the score, didn’t have a chance to fiddle with the optimizer parameters much)

this happened to me on some runs, but not on others (e.g. for some values of num_epochs but not for others)

1 Like

This is a wonderful resource for everyone - thanks @melissa.fabros !

What was your validation set accuracy? It’s easier to make suggestions once we know how you’re going so far.

Actually I can give you a strong hint - look at the equation here: https://www.kaggle.com/c/dogs-vs-cats-redux-kernels-edition/details/evaluation . Have a think about what minor change to your output might make a big difference to that evaluation function. Hint: I went from 105th spot to 37th spot by running two little commands in my text editor - no need to even start up my AWS instance…


Thanks @jeremy
First, I ran the vgg model as is (with 1000 class output, without fine-tuning to dogsvscatsredux) to generate predictions on the first 4 test images like so:
from vgg16 import Vgg16
vgg = Vgg16()
batches = vgg.get_batches(path+‘test’, batch_size=4,shuffle=False)
imgs,labels = next(batches)
vgg.predict(imgs, True)
and the output I get is:
(array([ 0.2321, 0.5742, 0.2567, 0.5104], dtype=float32),
array([285, 246, 229, 285]),
[u’Egyptian_cat’, u’Great_Dane’, u’Old_English_sheepdog’, u’Egyptian_cat’])

Next, I fine tuned the vgg model to generate two-class predictions like so:
vgg = Vgg16()
batches = vgg.get_batches(path+‘train’, batch_size=batch_size)
val_batches = vgg.get_batches(path+‘valid’, batch_size=batch_size)
vgg.fit(batches, val_batches, nb_epoch=1)
training completed, then I ran prediction on the test set:
batches = vgg.get_batches(path+‘test’, batch_size=4,shuffle=False)
imgs,labels = next(batches)
vgg.predict(imgs, True)
and the output I get is:
(array([ 1., 1., 1., 1.], dtype=float32),
array([0, 1, 1, 0]),
[u’tench’, u’goldfish’, u’goldfish’, u’tench’])

The predictions on the first 4 test images look correct (cat,dog,dog,cat). What I’m puzzled by is why the probabilities are always exactly 1. With a softmax output, I expect the class probability values be somewhere between 0-1, like they are with the original vgg net.

I’m running this on a p2 instance fwiw.


Can confirm this worked :slight_smile: