How to download data for Lesson 2 from Kaggle for Planet Competition


(Stas Bekman) #41

Combining all the useful suggestions from this thread, here is what I needed to do - and hopefully you just need to copy-n-paste it to work:

### kaggle (native) tool setup ###
pip install kaggle
mkdir ~/.kaggle/
# get the API key from your account 
1. visit https://www.kaggle.com/ => login => My Account 
   e.g. https://www.kaggle.com/YourUsername/account
2. hit [Create New API Token] 
3. save the file as ~/.kaggle/kaggle.json
4. set permissions
chmod 600 ~/.kaggle/kaggle.json

### get the data for the competition ###

# 1. accept the rules here:
https://www.kaggle.com/c/planet-understanding-the-amazon-from-space/rules
you may need to verify your kaggle account/phone for this to work.

# 2. download data
kaggle competitions files -c planet-understanding-the-amazon-from-space
COMPETITION=planet-understanding-the-amazon-from-space
DATA=~/data/planet  # your path to data
mkdir -p $DATA
kaggle competitions download -c $COMPETITION -f train-jpg.tar.7z -p $DATA
kaggle competitions download -c $COMPETITION -f test-jpg.tar.7z -p $DATA
kaggle competitions download -c $COMPETITION -f test-jpg-additional.tar.7z -p $DATA
kaggle competitions download -c $COMPETITION -f train_v2.csv.zip -p $DATA
kaggle competitions download -c $COMPETITION -f test_v2_file_mapping.csv.zip -p $DATA
kaggle competitions download -c $COMPETITION -f sample_submission_v2.csv.zip -p $DATA

# 3. unpack/cleanup
sudo apt install p7zip-full
cd $DATA
7z x -so train-jpg.tar.7z | tar xf - 
7z x -so test-jpg.tar.7z | tar xf - 
7z x -so test-jpg-additional.tar.7z | tar xf - 
unzip train_v2.csv.zip
unzip test_v2_file_mapping.csv.zip
unzip sample_submission_v2.csv.zip

# optional cleanup:
# rm *zip *7z
# rm -rf __MACOSX

Thank you all who contributed code and suggestions!

update:

Further simplification: if you want to download (1) all files at once (2) into the current folder there is -w option:

COMPETITION=planet-understanding-the-amazon-from-space
kaggle competitions download -w -c $COMPETITION

do check first if you want all the files - some of them can be optional and huge! To list what’s available do:

kaggle competitions files -c $COMPETITION

#42

@stas thanks your post was very helpful and after working through some snags I was able to download the kaggle files and step through the lesson 2 notebook.

Among the snags (if anyone cares):

  • In trying to install kaggle I get a version skew error: it uses some component that requires an older version of regex:

spacy 2.0.11 has requirement regex==2017.4.5, but you’ll have regex 2017.11.9 which is incompatible.

I thought maybe there was a newer version of spacy, but “pip install -U spacy” kept it at 2.0.11 and actually downgraded regex to 2017.4.5. Seems a little odd but it seems to have worked.

  • Signing up for kaggle was a big pain in its own right (had to use my phone since it doesn’t like my firefox setup, had to fumble through two recaptchas guessing pictures of cars to deal with their login and phone number verifications).

  • kg download then said the files were already downloaded, and then my paperspace instance froze, maybe from an idle timeout since I use ssh instead of their web console. Restarting the machine got the download to work.

  • kg download got the tif files which I hadn’t spotted weren’t needed.

  • After all this I reproduced all the notebook steps and they worked, but I’m not sure what I got out of it (it felt like a repeat of lesson 1). I didn’t have ideas for other interesting things to try with the data. I may post in another thread about that.


(Subigya) #43

As of July 2, 2018, there is no download button. I couldn’t use curlwget extension in chrome.


(William) #44

Put your mouse over “train-rif-v2.tar.7z” then you will see a little button for download on the same line. You can also use kaggle api or cli


(Peter) #45

Is there a brave soul, that can compile the entirety of all the DATA for all the lessons and slap in onto a Torrent Site? That would be… soo… amazing.


(Subigya) #46

no matter where I put my mouse can’t download. :frowning:


#47

Quick Question: Why is kaggle-cli preferable to using the large “download” button on the competition page?


(Yohei Komori) #48

I downloaded the specific file I want by:

$ wget https://www.kaggle.com/c/planet-understanding-the-amazon-from-space/download/train.csv


#49

In case anyone is having the same issue that I had: I kept getting a “403 forbidden” when trying to download via the kaggle cli, even though I had set up my credentials in the config. And when I went to the site, there was no link to download the data, and the site timed out when I tried to accept the competition rules.

It turned out that my Kaggle account wasn’t verified! (I made it a few years ago, possibly before they set up SMS verification.)

What I had to do was click “Late submission” on the competition page, which prompted me to do the SMS account verification. (I’m sure you can do it in the account settings too.) Then I could accept the competition rules, and then the kaggle cli download worked fine.


(Iram Shahzadi) #50

this is the best method :slight_smile:


(narendra sahu) #51

Hello all,
Anyone knows how to unzip grocery sales data (favorita grocery competition in kaggle). The data are in.csv.7z format. I am using crestle.
the ‘‘unzip train.csv.7z’’ showing error.
Also ‘sudo apt-get install p7zip-rar’ showing error ‘unable to change to rootgrid’…


#52

you may try this

7za x train.csv.7z

(Kaspar Lund) #53

plus you have to go to the datasection and abide to the rules


(kbs training) #54

nice post… for online training in hyderabad