How to download data for Lesson 2 from Kaggle for Planet Competition


(Vijay Narayanan Parakimeethal) #21

Can you install 7zip in the crestle instance using

sudo apt-get install p7zip-rar

Post that pls try

7za x <filename.tar.7z>


(Vijay Narayanan Parakimeethal) #22

@anurag Need your help here. Looks like in crestle you cannot install kaggle-cli and unzip tar.7z files. There is a error like lxml<4.1,>=4.0.0 distribution is needed. can you help @memetzgz in this as she is using crestle?


(Maureen Metzger) #23

Thanks Vijay for your help – yes, this is the error I’m getting


(Anurag Goel) #24

I’ll look into the test data.


(Anurag Goel) #25

Crestle does have the test-jpg, test-jpg-additional and test-tif-2 folders with ~40k/20k/61k images respectively. Are you running into issues with using them?


(Maureen Metzger) #26

Hi @anurag, problem may be then that I did not create the right symlinks? I will check when I next log on. Appreciate your looking into this on your end. Crestle is working very well otherwise!


(Vikrant Behal) #27

It shows 1.7G for me.


(Sudarsan Padmanabhan) #28

It is also possible to download specific file using kaggle-cli

$ kg download -u <username> -p <password> -c <competition> -f train.zip

(Jeremy Howard (Admin)) #29

Thanks for the tip - I didn’t know that :slight_smile:


(Clark Updike) #30
for f in test-jpg-additional.tar.7z test-jpg.tar.7z test_v2_file_mapping.csv.zip train-jpg.tar.7z train_v2.csv.zip
do
   kg download -f $f
done

(aswin) #31

I followed the steps from the first post, and kg download gives me this error.
‘NoneType’ object has no attribute ‘find_all’

I installed the cli using the --upgrade option.


(Priit) #32

@pnvijay maybe you can include this to the guide as well. I just started with Fast.ai and I was running into disk space errors on Paperspace with this dataset until I went to check the kaggle-cli github for info, this really saved me.


(Vijay Narayanan Parakimeethal) #33

Hi @Priit, Will include.


(Vijay Narayanan Parakimeethal) #34

@jeremy I am not able to edit my top post and include the fact that individual files can also be downloaded via kaggle cli. Can you please help?


(Emil) #35

Just wanted to mention that Kaggle has finally released the official CLI tool.

Although the detailed instructions are available in GitHub, here is a brief usage cheat sheet:

  1. Go to https://kaggle.com/YOUR_KAGGLE_USERNAME/account and click the “Create API Token” button.
  2. Save the token to the ~/.kaggle/kaggle.json on the target machine.
  3. Copy/remember the competition name from the URL.
  4. Don’t forget to accept the competition rules.
# Make sure you run this inside a conda environment
pip install kaggle
# Secure the credentials
chmod 600 ~/.kaggle/kaggle.json
# List all files for a competition
kaggle competitions files -c COMPETITION_NAME
# Download a single file to the current directory
kaggle competitions download -c COMPETITION_NAME -f DATASET_FILE -w

Kaggle-cli issues
(Abhiram MV) #36

Thank you, @pnvijay. Point 3 mentions kg download which downloads all the files, including the tif ones which are huge. To avoid that, I used kg download -f <filename> to download specific files.


(Vijay Narayanan Parakimeethal) #37

Hi Abhirammv, Thanks for the feedback. I want to incorporate the changes in my original post but not able to do that currently as I am not able to edit it. It looks there is a edit counter limit. I had edited the post around 3 times and hence it is not allowing me to edit again.


(Ilia) #38

Talking about new API, one could use the following commands to download required data (it seems that Kaggle official API doesn’t support listing file names in single command for now):

COMPETITION=planet-understanding-the-amazon-from-space
DATA=/home/user/data/kaggle/planet  # your path to data
kaggle competitions download -c $COMPETITION -f train-jpg.tar.7z -p $DATA
kaggle competitions download -c $COMPETITION -f test-jpg.tar.7z -p $DATA
kaggle competitions download -c $COMPETITION -f test-jpg-additional.tar.7z -p $DATA
kaggle competitions download -c $COMPETITION -f train_v2.csv.zip -p $DATA
kaggle competitions download -c $COMPETITION -f test_v2_file_mapping.csv.zip -p $DATA
kaggle competitions download -c $COMPETITION -f sample_submission_v2.csv.zip -p $DATA

Fast.ai with Google Colab
(Michael) #39

Thank you very much for the info!

How do get the kaggle.json file into ~/.kaggle/kaggle?
With “ls -la” I don’t even see the “.kaggle” folder.

Best regards
Michael


(Emil) #40
mkdir -p ~/.kaggle
mv path-to-the-downloaded-file ~/.kaggle/kaggle.json