Lesson 1 discussion

(Jeremy Howard) #1

Use this thread for any questions or comments on lesson 1!

Here is the video: https://vimeo.com/188790099 . Password: dl_course . (Please don’t share the video publicly - we’ll be producing it properly after the course when we create the MOOC.)

The wiki page for this lesson is here: http://wiki.fast.ai/index.php/Lesson_1 . Don’t forget to look at this page if you have any questions or problems, and to add any FAQs, resources, or comments to the wiki page that you can. If you haven’t edited a wiki before, have a look at the links here for help: https://www.mediawiki.org/wiki/Help:Editing .

(yad.faeq) #2

Relevant cheatsheets to some of the tools/libraries mentioned in class 1:

  1. Cheatsheet for tmux and screen

  2. Markdown Cheatsheet

(Rachel Thomas) #3

Assignment for week 1:

  • Finish getting your EC2 instance set up and be able to run the lesson 1 notebook. If you have any problems, you can post questions on the AWS setup topic or the running lesson 1 notebook topic.
  • Set up the kaggle CLI, and use it to download data for the Dogs vs Cats: Redux competition (a different dataset than the one we shared with you) to your EC2 instance.
  • Successfully submit to the above competition, using kaggle-cli, and try to get in the top 50%
  • Modify the lesson 1 notebook to work with the new dataset.
  • Look through other image data sets and choose one to work on during the course. You should talk with your group about whether you all want to work on the same dataset or not (either is fine). Check if a forum thread already exists for that dataset, and if not, create one. You can use the thread to talk with other students who are working with the same dataset.
  • Download the data to your EC2 instance, and modify the lesson 1 notebook to run on a sample subset of the data.

Kaggle Questions
Where are the homeworks assigned?
(ashley.fuller) #4

Hi Rachel,

I am working through Dogs and Cats and finding myself having trouble with the %matplotlib inline command. Other commands work fine but when I get to the point that images are supposed to be generated, I see nothing even though the code appears to run. Any words of wisdom on how to address this problem?

Thank you

(Jeremy Howard) #5

@ashley.fuller when asking for help, try to provide all the information you can to help the folks here help you. In particular, tell us exactly what error messages you receive, and what you’ve typed - particularly if you’ve done anything differently to the lesson1 notebook. And let us know what you’re running on - AWS p2 or t2 instance? Your own computer (if so, what OS)? Screenshots can also be helpful.

With this information I hope we’ll be able to fix your issue quickly! :slight_smile:

(ashley.fuller) #6

Sorry! I am running on my own mac OS X El Capitan version 10.11.5 using t2large. I simply ran the line as it appears in the lesson:
%matplotlib inline
I got no error message and so thought everything was alright. But when I reach the line:
plots(imgs, titles=labels)

It runs, but no plots appear. Am not getting error messages though. My best guess is that the matplotlib line went wrong somehow–I see no pictures!

Thank you.

(Jeremy Howard) #7

Hmm that’s odd. Can you try re-running the %matplotlib line? Do the tutorial examples (in a notebook) work?: http://www.scipy-lectures.org/intro/matplotlib/matplotlib.html

What browser are you using?

(ashley.fuller) #8

Tried re-running it-no luck. I am in Chrome.

(Jeremy Howard) #9

Do the tutorial examples (in a notebook) work?: http://www.scipy-lectures.org/intro/matplotlib/matplotlib.html

(ashley.fuller) #10

I will check, thank you

(Tom Elliot) #11

One other thing to note - the %matplotlib inline command has to be run before matplotlib gets imported:

If you’re note sure, try restarting the kernel (under the “Kernel” menu) and then execute the %matplotlib inline cell before any other cells.


(layla.tadjpour) #12

Hi Rachel,
I can run the lesson 1 notebook on the dogcats data with no problem.
However, I am not sure how to run it with Dogs vs Cats:Redux. data. I downloaded the train and test files from kaggle using kaggle-cli but there are not separated into dogs/cats images. Can you explain in more details how we are supposed to run the vgg on this new data and submit it?

(Jeremy Howard) #13

@layla.tadjpour you’ll need to write a script that separates it into the correct folder names. If you get stuck, you’ll find one written for you by @vshets that you could look at for inspiration in this thread: Kaggle Questions

(jbrown81) #14

Anyone else having an issue where the after the model is finetuned to make predictions on the dogs vs cats redux data, the predictions are binary? When I generate predictions on dogs/cats data with the vgg model before fine tuning, I get fractional probabilities for each of the 1000 imagenet categories (eg. .23 likelihood of Egyptian cat). However, after finetuning, I never get any predictions with fractional probabilities between 0 and 1, they’re always exactly 0 or exactly 1. The reason I ask is because the scoring function on kaggle is more forgiving of incorrect predictions closer to .5, eg for a given example, .55 likelihood of dog, .45 of cat, rather than 1 of dog, 0 of cat.

(Tom Elliot) #15

I’m having the same issue, but in addition it’s assigning classes of goldfish and tench to the images instead of dog and cat. These are the first two classes in the original vgg/imagenet model, so I guess Keras didn’t pull the names from the training directory structure?

I can totally replace the labels, but I get the idea I’ve missed something.

(Jeremy Howard) #16

@tom / @jbrown81 please let us know exactly what code you’re using, and what results you get - some tips on asking for help are here: http://wiki.fast.ai/index.php/How_to_ask_for_Help . Once we have this info, I’m sure we’ll be able to resolve your problems.

(melissa.fabros) #17

Can I reflect back what I think the todo list is for lesson 1?

  1. Get AWS instance running (either g2 or m4 if not yet approved for g2) after contacting support etc.
  2. Setup ssh keys as per instructions in setup video
  3. install bash setup script onto server instance
  4. launch jupyter notebook on the instance
  5. once the notebook is running, review the lesson 1 notebook notes and run each cell of code to figure out what python and vgg is doing
  6. install kaggle CLI onto the server instance
  7. use the kaggle CLI to download the current data for the Dogs vs. Cats Redux competition
  8. configure the new data to the file structure in the same way that was used in the sample lesson 1 notebook
  9. make a copy of the lesson 1 notebook and use the new copy to draw in the new Dogs Vs. Cats data (including moving utils.py and vgg16.py to the new folder where the new notebook sits?)
  10. Run the relevant code cells on the sample set of new Dogs v. Cats data to make a prediction on the new image data set.
  11. Once, the sample set works, modify the jupyter notebook to use the on the new test data images
  12. write a script that takes the predict() data of the new Dogs vs. Cats data and writes the data to a new csv file in the format of the sample_submission.csv file that was downloaded with the Dogs vs. Cats
  13. submit that new submission.csv file to the kaggle via the CLI tool
  14. check the public scoreboard for your own ranking
  15. modify or tune current code in the lesson 1 notebook to try to get into the top 50% ranking of the current Dogs v Cats competition
  16. start exploring the other new datasets on kaggle and decide which one you or some teammates would like to study further during the course
  17. download the new data to your EC2 instance and repeat the previous steps with your brand new data.

Is this about right?

I understand that the first lesson is mostly about getting comfortable with a terminal to manipulate a server instance in the cloud, how to organize raw data, and how to participate in the kaggle community.

Are we also learning how to fine tune the model onto the data? the examples with the VGG() model seem already optimized with batch=64. To submit to the kaggle competition did you mean for us to optimize the “the Create a VGG model from scratch in Keras” with the new Dogs v Cats data in order to get above 50% in the kaggle ranking?

(leahob) #18

At this stage of the class, what knobs could we be using for fine-tuning VGG16 for dogs vs cats and scoring “well” (within top half of leaderboard) in the Kaggle competition?

  • running more epochs?
  • tweaking the optimizer parameters? (not really covered yet…)

** I wanted to add that I was able to submit and get on the board but a little bit shy of the top half :slight_smile:

(melissa.fabros) #19

I think discussion about how to tune the model is happening here:

(melissa.fabros) #20

If anyone is looking for documentation on the methods that a VGG object takes:

Could be helpful in tuning the prediction or understanding the parameters and returns of methods like predict()