How to download data for Lesson 2 from Kaggle for Planet Competition

(Vijay Narayanan Parakimeethal) #21

Can you install 7zip in the crestle instance using

sudo apt-get install p7zip-rar

Post that pls try

7za x <filename.tar.7z>

(Vijay Narayanan Parakimeethal) #22

@anurag Need your help here. Looks like in crestle you cannot install kaggle-cli and unzip tar.7z files. There is a error like lxml<4.1,>=4.0.0 distribution is needed. can you help @memetzgz in this as she is using crestle?

(Maureen Metzger) #23

Thanks Vijay for your help – yes, this is the error I’m getting

(Anurag Goel) #24

I’ll look into the test data.

(Anurag Goel) #25

Crestle does have the test-jpg, test-jpg-additional and test-tif-2 folders with ~40k/20k/61k images respectively. Are you running into issues with using them?

(Maureen Metzger) #26

Hi @anurag, problem may be then that I did not create the right symlinks? I will check when I next log on. Appreciate your looking into this on your end. Crestle is working very well otherwise!

(Vikrant Behal) #27

It shows 1.7G for me.

(Sudarsan Padmanabhan) #28

It is also possible to download specific file using kaggle-cli

$ kg download -u <username> -p <password> -c <competition> -f

(Jeremy Howard (Admin)) #29

Thanks for the tip - I didn’t know that :slight_smile:

(Clark Updike) #30
for f in test-jpg-additional.tar.7z test-jpg.tar.7z train-jpg.tar.7z
   kg download -f $f

(aswin) #31

I followed the steps from the first post, and kg download gives me this error.
‘NoneType’ object has no attribute ‘find_all’

I installed the cli using the --upgrade option.

(Priit) #32

@pnvijay maybe you can include this to the guide as well. I just started with and I was running into disk space errors on Paperspace with this dataset until I went to check the kaggle-cli github for info, this really saved me.

(Vijay Narayanan Parakimeethal) #33

Hi @Priit, Will include.

(Vijay Narayanan Parakimeethal) #34

@jeremy I am not able to edit my top post and include the fact that individual files can also be downloaded via kaggle cli. Can you please help?

(Emil) #35

Just wanted to mention that Kaggle has finally released the official CLI tool.

Although the detailed instructions are available in GitHub, here is a brief usage cheat sheet:

  1. Go to and click the “Create API Token” button.
  2. Save the token to the ~/.kaggle/kaggle.json on the target machine.
  3. Copy/remember the competition name from the URL.
  4. Don’t forget to accept the competition rules.
# Make sure you run this inside a conda environment
pip install kaggle
# Secure the credentials
chmod 600 ~/.kaggle/kaggle.json
# List all files for a competition
kaggle competitions files -c COMPETITION_NAME
# Download a single file to the current directory
kaggle competitions download -c COMPETITION_NAME -f DATASET_FILE -w

Kaggle-cli issues
(Abhiram MV) #36

Thank you, @pnvijay. Point 3 mentions kg download which downloads all the files, including the tif ones which are huge. To avoid that, I used kg download -f <filename> to download specific files.

(Vijay Narayanan Parakimeethal) #37

Hi Abhirammv, Thanks for the feedback. I want to incorporate the changes in my original post but not able to do that currently as I am not able to edit it. It looks there is a edit counter limit. I had edited the post around 3 times and hence it is not allowing me to edit again.

(Ilia) #38

Talking about new API, one could use the following commands to download required data (it seems that Kaggle official API doesn’t support listing file names in single command for now):

DATA=/home/user/data/kaggle/planet  # your path to data
kaggle competitions download -c $COMPETITION -f train-jpg.tar.7z -p $DATA
kaggle competitions download -c $COMPETITION -f test-jpg.tar.7z -p $DATA
kaggle competitions download -c $COMPETITION -f test-jpg-additional.tar.7z -p $DATA
kaggle competitions download -c $COMPETITION -f -p $DATA
kaggle competitions download -c $COMPETITION -f -p $DATA
kaggle competitions download -c $COMPETITION -f -p $DATA with Google Colab
(Michael) #39

Thank you very much for the info!

How do get the kaggle.json file into ~/.kaggle/kaggle?
With “ls -la” I don’t even see the “.kaggle” folder.

Best regards

(Emil) #40
mkdir -p ~/.kaggle
mv path-to-the-downloaded-file ~/.kaggle/kaggle.json