How to download data for Lesson 2 from Kaggle for Planet Competition

for f in test-jpg-additional.tar.7z test-jpg.tar.7z test_v2_file_mapping.csv.zip train-jpg.tar.7z train_v2.csv.zip
do
   kg download -f $f
done

I followed the steps from the first post, and kg download gives me this error.
‘NoneType’ object has no attribute ‘find_all’

I installed the cli using the --upgrade option.

@pnvijay maybe you can include this to the guide as well. I just started with Fast.ai and I was running into disk space errors on Paperspace with this dataset until I went to check the kaggle-cli github for info, this really saved me.

2 Likes

Hi @Priit, Will include.

1 Like

@jeremy I am not able to edit my top post and include the fact that individual files can also be downloaded via kaggle cli. Can you please help?

Just wanted to mention that Kaggle has finally released the official CLI tool.

Although the detailed instructions are available in GitHub, here is a brief usage cheat sheet:

  1. Go to https://kaggle.com/YOUR_KAGGLE_USERNAME/account and click the “Create API Token” button.
  2. Save the token to the ~/.kaggle/kaggle.json on the target machine.
  3. Copy/remember the competition name from the URL.
  4. Don’t forget to accept the competition rules.
# Make sure you run this inside a conda environment
pip install kaggle
# Secure the credentials
chmod 600 ~/.kaggle/kaggle.json
# List all files for a competition
kaggle competitions files -c COMPETITION_NAME
# Download a single file to the current directory
kaggle competitions download -c COMPETITION_NAME -f DATASET_FILE -w
7 Likes

Thank you, @pnvijay. Point 3 mentions kg download which downloads all the files, including the tif ones which are huge. To avoid that, I used kg download -f <filename> to download specific files.

Hi Abhirammv, Thanks for the feedback. I want to incorporate the changes in my original post but not able to do that currently as I am not able to edit it. It looks there is a edit counter limit. I had edited the post around 3 times and hence it is not allowing me to edit again.

1 Like

Talking about new API, one could use the following commands to download required data (it seems that Kaggle official API doesn’t support listing file names in single command for now):

COMPETITION=planet-understanding-the-amazon-from-space
DATA=/home/user/data/kaggle/planet  # your path to data
kaggle competitions download -c $COMPETITION -f train-jpg.tar.7z -p $DATA
kaggle competitions download -c $COMPETITION -f test-jpg.tar.7z -p $DATA
kaggle competitions download -c $COMPETITION -f test-jpg-additional.tar.7z -p $DATA
kaggle competitions download -c $COMPETITION -f train_v2.csv.zip -p $DATA
kaggle competitions download -c $COMPETITION -f test_v2_file_mapping.csv.zip -p $DATA
kaggle competitions download -c $COMPETITION -f sample_submission_v2.csv.zip -p $DATA
1 Like

Thank you very much for the info!

How do get the kaggle.json file into ~/.kaggle/kaggle?
With “ls -la” I don’t even see the “.kaggle” folder.

Best regards
Michael

mkdir -p ~/.kaggle
mv path-to-the-downloaded-file ~/.kaggle/kaggle.json
1 Like

Combining all the useful suggestions from this thread, here is what I needed to do - and hopefully you just need to copy-n-paste it to work:

### kaggle (native) tool setup ###
pip install kaggle
mkdir ~/.kaggle/
# get the API key from your account 
1. visit https://www.kaggle.com/ => login => My Account 
   e.g. https://www.kaggle.com/YourUsername/account
2. hit [Create New API Token] 
3. save the file as ~/.kaggle/kaggle.json
4. set permissions
chmod 600 ~/.kaggle/kaggle.json

### get the data for the competition ###

# 1. accept the rules here:
https://www.kaggle.com/c/planet-understanding-the-amazon-from-space/rules
you may need to verify your kaggle account/phone for this to work.

# 2. download data
kaggle competitions files -c planet-understanding-the-amazon-from-space
COMPETITION=planet-understanding-the-amazon-from-space
DATA=~/data/planet  # your path to data
mkdir -p $DATA
kaggle competitions download -c $COMPETITION -f train-jpg.tar.7z -p $DATA
kaggle competitions download -c $COMPETITION -f test-jpg.tar.7z -p $DATA
kaggle competitions download -c $COMPETITION -f test-jpg-additional.tar.7z -p $DATA
kaggle competitions download -c $COMPETITION -f train_v2.csv.zip -p $DATA
kaggle competitions download -c $COMPETITION -f test_v2_file_mapping.csv.zip -p $DATA
kaggle competitions download -c $COMPETITION -f sample_submission_v2.csv.zip -p $DATA

# 3. unpack/cleanup
sudo apt install p7zip-full
cd $DATA
7z x -so train-jpg.tar.7z | tar xf - 
7z x -so test-jpg.tar.7z | tar xf - 
7z x -so test-jpg-additional.tar.7z | tar xf - 
unzip train_v2.csv.zip
unzip test_v2_file_mapping.csv.zip
unzip sample_submission_v2.csv.zip

# optional cleanup:
# rm *zip *7z
# rm -rf __MACOSX

Thank you all who contributed code and suggestions!

update:

Further simplification: if you want to download (1) all files at once (2) into the current folder there is -w option:

COMPETITION=planet-understanding-the-amazon-from-space
kaggle competitions download -w -c $COMPETITION

do check first if you want all the files - some of them can be optional and huge! To list what’s available do:

kaggle competitions files -c $COMPETITION
15 Likes

@stas thanks your post was very helpful and after working through some snags I was able to download the kaggle files and step through the lesson 2 notebook.

Among the snags (if anyone cares):

  • In trying to install kaggle I get a version skew error: it uses some component that requires an older version of regex:

spacy 2.0.11 has requirement regex==2017.4.5, but you’ll have regex 2017.11.9 which is incompatible.

I thought maybe there was a newer version of spacy, but “pip install -U spacy” kept it at 2.0.11 and actually downgraded regex to 2017.4.5. Seems a little odd but it seems to have worked.

  • Signing up for kaggle was a big pain in its own right (had to use my phone since it doesn’t like my firefox setup, had to fumble through two recaptchas guessing pictures of cars to deal with their login and phone number verifications).

  • kg download then said the files were already downloaded, and then my paperspace instance froze, maybe from an idle timeout since I use ssh instead of their web console. Restarting the machine got the download to work.

  • kg download got the tif files which I hadn’t spotted weren’t needed.

  • After all this I reproduced all the notebook steps and they worked, but I’m not sure what I got out of it (it felt like a repeat of lesson 1). I didn’t have ideas for other interesting things to try with the data. I may post in another thread about that.

1 Like

As of July 2, 2018, there is no download button. I couldn’t use curlwget extension in chrome.

Put your mouse over “train-rif-v2.tar.7z” then you will see a little button for download on the same line. You can also use kaggle api or cli

Is there a brave soul, that can compile the entirety of all the DATA for all the lessons and slap in onto a Torrent Site? That would be… soo… amazing.

no matter where I put my mouse can’t download. :frowning:

Quick Question: Why is kaggle-cli preferable to using the large “download” button on the competition page?

I downloaded the specific file I want by:

$ wget https://www.kaggle.com/c/planet-understanding-the-amazon-from-space/download/train.csv

2 Likes

In case anyone is having the same issue that I had: I kept getting a “403 forbidden” when trying to download via the kaggle cli, even though I had set up my credentials in the config. And when I went to the site, there was no link to download the data, and the site timed out when I tried to accept the competition rules.

It turned out that my Kaggle account wasn’t verified! (I made it a few years ago, possibly before they set up SMS verification.)

What I had to do was click “Late submission” on the competition page, which prompted me to do the SMS account verification. (I’m sure you can do it in the account settings too.) Then I could accept the competition rules, and then the kaggle cli download worked fine.

2 Likes