Since we all will be using the planet dataset for the Lesson 2, I thought it would be best to put down the steps to do this on AWS. I have done this and been able to run the note book successfully. Hope this helps.
Install Kaggle CLI (if done, Go to Step 2) pip install kaggle-cli
Configure your kaggle account kg config –u <your username (your email most likely)> -p <your password> -c <competition name>
Note:
a. Go to Kaggle Competition Website, Login and accept the rules of competition
b. If you’ve always signed into Kaggle using a linked social media account, you will get an error using the kaggle cli, which requires that you have a separate kaggle login. Fortunately, Kaggle has a solution: if you select Forgot Password?, you’ll receive an email with a few different options, the 3rd of which lets you set up your own Kaggle username/password and connects it to your original social media account
c. How to find Kaggle competition name – Go to Kaggle competition page in kaggle website and take the name. For ex – if page is https://www.kaggle.com/c/planet-understanding-the-amazon-from-space, then competition name is planet-understanding-the-amazon-from-space
Download the data kg download
Extract data: zip files unzip –q <filename.zip>
Extract data: tar files 7za x <filename.tar.7z> This extracts 7z format and delivers an output <filename.tar> tar xf <filename.tar>
You only need the following files for running the notebook (as per my understanding for now. @jeremy will probably explain this in the next class)
a. train-jpg
b. test-jpg
c. test-jpg-additional
d. train_v2.csv
e. test_v2_file_mapping.csv
f. sample_submission_v2.csv
I deleted the rest of the files as the device was running out of space, but if you have space you can keep it in a separate folder under data/planet.
I keep getting list index out of range errors. I’ve tried switching the competition between dog-breed-identification and planet-understanding-the-amazon-from-space. I’m pretty sure I"m using the correct username and pass.
Thank you! A definite improvement from the earlier error. However it now tells me that the file resolves to an html document rather than a file. I’m fairly certain I’ve accepted the competition terms…
Edit: resolved the issue. I was using my kaggle username instead of the email address I used to sign up.
Future users might try kg config –u <your email you signed up with> -p <your password> -c <competition name>
Probably no advantage at this stage - here’s some info about it if you’re interested: https://www.techsupportalert.com/what-is-bittorrent . Largely it’s to benefit Kaggle, but it’s only helpful when a competition is active and busy.
I thought I would avoid the download issue by using the Crestle pre-loaded files, but then ran into the problem that the test images seem not to have been uploaded there.
So I got the two 7zip files loaded up, but then can’t seem to extract them with the commands they provided on the data page for the competition.
I tried re-installing 7zip but ran into some weird dependency issue – something about the version of lxml being wrong.
Is there any other unzipper that can be used to extract the tar file?