How to download data for Lesson 2 from Kaggle for Planet Competition

(Vijay Narayanan Parakimeethal) #21

Can you install 7zip in the crestle instance using

sudo apt-get install p7zip-rar

Post that pls try

7za x <filename.tar.7z>

(Vijay Narayanan Parakimeethal) #22

@anurag Need your help here. Looks like in crestle you cannot install kaggle-cli and unzip tar.7z files. There is a error like lxml<4.1,>=4.0.0 distribution is needed. can you help @memetzgz in this as she is using crestle?

(Maureen Metzger) #23

Thanks Vijay for your help – yes, this is the error I’m getting

(Anurag Goel) #24

I’ll look into the test data.

(Anurag Goel) #25

Crestle does have the test-jpg, test-jpg-additional and test-tif-2 folders with ~40k/20k/61k images respectively. Are you running into issues with using them?

(Maureen Metzger) #26

Hi @anurag, problem may be then that I did not create the right symlinks? I will check when I next log on. Appreciate your looking into this on your end. Crestle is working very well otherwise!

(Vikrant Behal) #27

It shows 1.7G for me.

(Sudarsan Padmanabhan) #28

It is also possible to download specific file using kaggle-cli

$ kg download -u <username> -p <password> -c <competition> -f

(Jeremy Howard) #29

Thanks for the tip - I didn’t know that :slight_smile:

(Clark Updike) #30
for f in test-jpg-additional.tar.7z test-jpg.tar.7z train-jpg.tar.7z
   kg download -f $f

(aswin) #31

I followed the steps from the first post, and kg download gives me this error.
‘NoneType’ object has no attribute ‘find_all’

I installed the cli using the --upgrade option.

(Priit) #32

@pnvijay maybe you can include this to the guide as well. I just started with and I was running into disk space errors on Paperspace with this dataset until I went to check the kaggle-cli github for info, this really saved me.

(Vijay Narayanan Parakimeethal) #33

Hi @Priit, Will include.

(Vijay Narayanan Parakimeethal) #34

@jeremy I am not able to edit my top post and include the fact that individual files can also be downloaded via kaggle cli. Can you please help?

(Emil) #35

Just wanted to mention that Kaggle has finally released the official CLI tool.

Although the detailed instructions are available in GitHub, here is a brief usage cheat sheet:

  1. Go to and click the “Create API Token” button.
  2. Save the token to the ~/.kaggle/kaggle.json on the target machine.
  3. Copy/remember the competition name from the URL.
  4. Don’t forget to accept the competition rules.
# Make sure you run this inside a conda environment
pip install kaggle
# Secure the credentials
chmod 600 ~/.kaggle/kaggle.json
# List all files for a competition
kaggle competitions files -c COMPETITION_NAME
# Download a single file to the current directory
kaggle competitions download -c COMPETITION_NAME -f DATASET_FILE -w

Kaggle-cli issues
(Abhiram MV) #36

Thank you, @pnvijay. Point 3 mentions kg download which downloads all the files, including the tif ones which are huge. To avoid that, I used kg download -f <filename> to download specific files.

(Vijay Narayanan Parakimeethal) #37

Hi Abhirammv, Thanks for the feedback. I want to incorporate the changes in my original post but not able to do that currently as I am not able to edit it. It looks there is a edit counter limit. I had edited the post around 3 times and hence it is not allowing me to edit again.


Talking about new API, one could use the following commands to download required data (it seems that Kaggle official API doesn’t support listing file names in single command for now):

DATA=/home/user/data/kaggle/planet  # your path to data
kaggle competitions download -c $COMPETITION -f train-jpg.tar.7z -p $DATA
kaggle competitions download -c $COMPETITION -f test-jpg.tar.7z -p $DATA
kaggle competitions download -c $COMPETITION -f test-jpg-additional.tar.7z -p $DATA
kaggle competitions download -c $COMPETITION -f -p $DATA
kaggle competitions download -c $COMPETITION -f -p $DATA
kaggle competitions download -c $COMPETITION -f -p $DATA with Google Colab
(Michael) #39

Thank you very much for the info!

How do get the kaggle.json file into ~/.kaggle/kaggle?
With “ls -la” I don’t even see the “.kaggle” folder.

Best regards

(Emil) #40
mkdir -p ~/.kaggle
mv path-to-the-downloaded-file ~/.kaggle/kaggle.json