How to download data for Lesson 2 from Kaggle for Planet Competition

Can you install 7zip in the crestle instance using

sudo apt-get install p7zip-rar

Post that pls try

7za x <filename.tar.7z>

2 Likes

@anurag Need your help here. Looks like in crestle you cannot install kaggle-cli and unzip tar.7z files. There is a error like lxml<4.1,>=4.0.0 distribution is needed. can you help @memetzgz in this as she is using crestle?

Thanks Vijay for your help – yes, this is the error I’m getting

I’ll look into the test data.

Crestle does have the test-jpg, test-jpg-additional and test-tif-2 folders with ~40k/20k/61k images respectively. Are you running into issues with using them?

2 Likes

Hi @anurag, problem may be then that I did not create the right symlinks? I will check when I next log on. Appreciate your looking into this on your end. Crestle is working very well otherwise!

2 Likes

It shows 1.7G for me.

It is also possible to download specific file using kaggle-cli

$ kg download -u <username> -p <password> -c <competition> -f train.zip
3 Likes

Thanks for the tip - I didn’t know that :slight_smile:

1 Like
for f in test-jpg-additional.tar.7z test-jpg.tar.7z test_v2_file_mapping.csv.zip train-jpg.tar.7z train_v2.csv.zip
do
   kg download -f $f
done

I followed the steps from the first post, and kg download gives me this error.
‘NoneType’ object has no attribute ‘find_all’

I installed the cli using the --upgrade option.

@pnvijay maybe you can include this to the guide as well. I just started with Fast.ai and I was running into disk space errors on Paperspace with this dataset until I went to check the kaggle-cli github for info, this really saved me.

2 Likes

Hi @Priit, Will include.

1 Like

@jeremy I am not able to edit my top post and include the fact that individual files can also be downloaded via kaggle cli. Can you please help?

Just wanted to mention that Kaggle has finally released the official CLI tool.

Although the detailed instructions are available in GitHub, here is a brief usage cheat sheet:

  1. Go to https://kaggle.com/YOUR_KAGGLE_USERNAME/account and click the “Create API Token” button.
  2. Save the token to the ~/.kaggle/kaggle.json on the target machine.
  3. Copy/remember the competition name from the URL.
  4. Don’t forget to accept the competition rules.
# Make sure you run this inside a conda environment
pip install kaggle
# Secure the credentials
chmod 600 ~/.kaggle/kaggle.json
# List all files for a competition
kaggle competitions files -c COMPETITION_NAME
# Download a single file to the current directory
kaggle competitions download -c COMPETITION_NAME -f DATASET_FILE -w
7 Likes

Thank you, @pnvijay. Point 3 mentions kg download which downloads all the files, including the tif ones which are huge. To avoid that, I used kg download -f <filename> to download specific files.

Hi Abhirammv, Thanks for the feedback. I want to incorporate the changes in my original post but not able to do that currently as I am not able to edit it. It looks there is a edit counter limit. I had edited the post around 3 times and hence it is not allowing me to edit again.

1 Like

Talking about new API, one could use the following commands to download required data (it seems that Kaggle official API doesn’t support listing file names in single command for now):

COMPETITION=planet-understanding-the-amazon-from-space
DATA=/home/user/data/kaggle/planet  # your path to data
kaggle competitions download -c $COMPETITION -f train-jpg.tar.7z -p $DATA
kaggle competitions download -c $COMPETITION -f test-jpg.tar.7z -p $DATA
kaggle competitions download -c $COMPETITION -f test-jpg-additional.tar.7z -p $DATA
kaggle competitions download -c $COMPETITION -f train_v2.csv.zip -p $DATA
kaggle competitions download -c $COMPETITION -f test_v2_file_mapping.csv.zip -p $DATA
kaggle competitions download -c $COMPETITION -f sample_submission_v2.csv.zip -p $DATA
1 Like

Thank you very much for the info!

How do get the kaggle.json file into ~/.kaggle/kaggle?
With “ls -la” I don’t even see the “.kaggle” folder.

Best regards
Michael

mkdir -p ~/.kaggle
mv path-to-the-downloaded-file ~/.kaggle/kaggle.json
1 Like