Lesson 2: Downloading + Formatting Dog Breed Data

So I am working on following along with Lesson 2. I downloaded the dataset to data/dogbreed using the Chrome Wget extension.

My current working Git Repo can be found here.


The issues I am running into is that the data seems to be totally unformatted in terms of being divided into training and validation sets. The way it appears when I look at the data through the Crestle terminal is just a series of all the images with no folders or subdirectories, whatsoever.

Here is what I get when I run the ls –d */ command from the terminal:

Here is the last line of the error message that Jupyter NB is returning (full message in the Repo).

FileNotFoundError: File b'data/dogbreed/labels.csv' does not exist

Just to be sure, I downloaded the data straight to my laptop and found the exact same state of affairs. If Kaggle has changed the way they store the data for this challenge, that’s fine. I would love to know others suggestions for a programmatic method for getting the file structure set up and formatted for training.

If Kaggle has not changed the way the data is stored, then I am curious as to any ideas on why the code from the video has not been working for me on Crestle.

Note: I now see that there are multiple zip files to be downloaded on the data page for the Kaggle challenge. I’ll download those and re-run the NB to see if that takes care of things.

Update: Downloading each of the the additional zip files via the terminal and unzipping solved the issue.

not sure why you used wget when the video shows use of the kaggle tool to download data.

I’m lazy and dislike creating work for myself.


kaggle competitions submit -c dog-breed-identification -f submission.csv -m “Message”

this will put the data in ~/.kaggle/competitions/dog-breed-identification/

then unzip to dir of your choice. the video shows use of “data/dogbreed/” which will be relative to the dir your notebook is in.