How to Download a Kaggle competition Dataset in Crestle

Hi Everyone.
I remember @jeremy gave instruction to download a Kaggle data using the Curl command in Firefox.
I wanted to download https://www.kaggle.com/c/dog-breed-identification/data for experimenting Lesson 1. Below is the Curl command from the Firefox

curl 'https://storage.googleapis.com/kaggle-competitions-data/kaggle/7327/labels.csv.zip?GoogleAccessId=competitions-data@kaggle-161607.iam.gserviceaccount.com&Expires=1509818286&Signature=kVhHkOVrMgUXDakeDoPPRaa3gCD8FFh5CXzdKsFeQjQTKesS3F5mlBtwi6Pv8kFC4XHi76jHGkQ%2F4vpYoUhQkreUjyUvH3TtviEKHJpXfHxSGfOxX5l%2BC7g9dycIbDFYiX2PRgTcvHCd4QC66pYweAeTos6k2hC0bp0jZLZlSjWeMARDBi%2FXsQCZxYfJgOf%2BeN%2FhEEUdjpjdiFWgsjtTSH%2F%2F0UISUxPDbD3xxqLPMLaDMM5PFx6aLKD5lfF5JLIXDgnflgD9%2BF8AweDjq%2FitUVw6MFQbRzEwNcrklkV8J4ZX0ZFyRiSi7ylbKxjbC0ght8fdWt%2BEktjrtDx2NixXFA%3D%3D' -H 'Host: storage.googleapis.com' -H 'User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:54.0) Gecko/20100101 Firefox/54.0' -H 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8' -H 'Accept-Language: en-GB,en;q=0.5' --compressed -H 'Referer: https://www.kaggle.com/c/dog-breed-identification/data' -H 'Connection: keep-alive' -H 'Upgrade-Insecure-Requests: 1' -o data

It is not downloading nor it is giving any error. Can you please help.

P.S I am not sure if it is ok to Post the command. If there is any security risk please flag it I will remove the post.

You can use wget for Crestle.

For Kaggle, I personally like the kaggle-cli tool found here from the previous version of this course.

Since the data is in .zip, you’d need to install unzip on Crestle. It doesn’t let you sudo in (which you’d need, as described in that link I mention above). Use conda install unzip as a workaround and unzip later. :slight_smile:

4 Likes

Hello,

I am trying kaggle-cli on Crestle to download Digit Recognizer dataset.

pip3 install kaggle-cli
kg download -u USERNAME -p PASSWORD -c digit-recognizer

and I get “list index out of range” back…

Would anyone be able to help - what am I doing wrong here?

Antti

2 Likes

This issue has been resolved here it seems.

Thanks!

…however what should I do to make this work in practice with kaggle-cli :slight_smile: ?

For others potentially strugging with the same problem: I finally managed to get the data loaded to Crestle using the cURL trick showed by Jeremy in the Machine Learning course lesson 1 video (around 23:00 minutes). However, I wasn’t able to paste anything (in this case the long cURL) to Crestle terminal using Firefox. When I tried with Internet Explorer, the “paste” option became available when right-clicking…

pip install git+https://github.com/floydwch/kaggle-cli.git should fix the kaggle-cli problem.

4 Likes

Thanks, Everyone , that worked. But it will also be good @anurag can provide an upload with a zip files for the dataset then it will be great. This will allow us to prepare the dataset preprocessing and update it to crestle .

For me this had to be pip3 install, Just posting in case anybody else has that issue.

2 Likes

I am having an issue getting the Dog Breed Identification dataset to download as shown below.
Please offer suggestions. Thank you.

(fastai) ubuntu@ip-172-31-17-26:~$ kg config -g -u ‘bdekoven’ -p ‘xxxx’ -c ‘Dog Breed Identification’
(fastai) ubuntu@ip-172-31-17-26:~$ kg download
competition not found
(fastai) ubuntu@ip-172-31-17-26:~$ kg config -g -u ‘bdekoven’ -p ‘xxxx’ -c Dog-Breed-Identification
(fastai) ubuntu@ip-172-31-17-26:~$ kg download
’NoneType’ object has no attribute ‘find_all’
(fastai) ubuntu@ip-172-31-17-26:~$

Try doing a pip install kaggle-cli --upgrade and see if it resolves the issue.

Also, see similar issues here and here. Maybe these will help? :slight_smile:

Also it may(?) be case sensitive: dog-breed-identification

@anurag Hi Anurag, would it be too much to ask if dog-breed-identification can be added in kaggle/datasets of crestle ?

Here is the command line which worked: $ kg config -g -u bdekoven -p xxxx -c dog-breed–identification
Then I could download and unzip.

By the way I am running on AWS, sorry to post here since “in Crestle”

Thank you for the suggestions!

The dataset is now available on Crestle under /datasets/kaggle/dog-breed-identification.

1 Like

Thanks a lot!

1 Like

Under which directory of crestle? I can’t find the dataset… And When I try to use the kaggle cli, I get the error like "pkg_resources.DistributionNotFound: The ‘lxml<4.1,>= 4.0.0’ distribution not found and is required by kaggle-cli "

The /datasets/kaggle/dog-breed-identification directory at the root of the filesystem.

You can fix the lxml issue by installing an older version that’s needed by kaggle-cli:

pip3 install lxml==4.0.0

The problem is not with the Non-Type, I guess it is the problem with the password/username/competition, When I execute
-kg config -g -u bdekoven -p xxxx -c dog-breed-identification
it works.

It works! Thanks!

Glad you got this to work. Sorry I did not respond sooner.