Lesson 2: Downloading + Formatting Dog Breed Data

P_Dubz · April 21, 2018, 7:58pm

So I am working on following along with Lesson 2. I downloaded the dataset to data/dogbreed using the Chrome Wget extension.

My current working Git Repo can be found here.

https://github.com/Hack-My-Life/DogBreed/blob/master/dogbreed-training.ipynb

The issues I am running into is that the data seems to be totally unformatted in terms of being divided into training and validation sets. The way it appears when I look at the data through the Crestle terminal is just a series of all the images with no folders or subdirectories, whatsoever.

Here is what I get when I run the ls –d */ command from the terminal:

Here is the last line of the error message that Jupyter NB is returning (full message in the Repo).

FileNotFoundError: File b'data/dogbreed/labels.csv' does not exist

Just to be sure, I downloaded the data straight to my laptop and found the exact same state of affairs. If Kaggle has changed the way they store the data for this challenge, that’s fine. I would love to know others suggestions for a programmatic method for getting the file structure set up and formatted for training.

If Kaggle has not changed the way the data is stored, then I am curious as to any ideas on why the code from the video has not been working for me on Crestle.

Note: I now see that there are multiple zip files to be downloaded on the data page for the Kaggle challenge. I’ll download those and re-run the NB to see if that takes care of things.

Update: Downloading each of the the additional zip files via the terminal and unzipping solved the issue.

MTAU · May 7, 2018, 5:30am

not sure why you used wget when the video shows use of the kaggle tool to download data.

I’m lazy and dislike creating work for myself.

https://www.kaggle.com/c/dog-breed-identification/submissions?sortBy=date&group=all&page=1

kaggle competitions submit -c dog-breed-identification -f submission.csv -m “Message”

this will put the data in ~/.kaggle/competitions/dog-breed-identification/

then unzip to dir of your choice. the video shows use of “data/dogbreed/” which will be relative to the dir your notebook is in.