Lesson 3 - Can't Download Planet Data Images Tar Archive

@Jonny5
Thanks for your solution. I was able to download the .tar.7z file after a long struggle.
However, now when I tried to unpack the file from {path} through the following command.

! 7za -bd -y -so x {path}/train-jpg.tar.7z | tar xf - -C {path.as_posix()}

I’m getting the following error:

ERROR: /home/jupyter/.fastai/data/planet/train-jpg.tar.7z
/home/jupyter/.fastai/data/planet/train-jpg.tar.7z
Open ERROR: Can not open the file as [7z] archive
ERRORS:
Is not archive
tar: This does not look like a tar archive
tar: Exiting with failure status due to previous errors

Can anyone tell me what is wrong?
Wonder what other people did to proceed.

It is most likely a file permission problem. Before unzipping your file, run this

!chmod 600  /home/jupyter/.fastai/data/planet/train-jpg.tar.7z

@farid
No, the problem still exists. Same error. Please tell me anything else that might be a problem.
Thanks :slight_smile:

@farid @Jonny5

I think I see the problem. The method of downloading the .7z file as suggested by @Jonny5 has apparently not worked for me.
When I ran the command

!wget --load-cookies data/planet/cookies.txt
{url}
-O {path}/train-jpg.tar.7z

I get the following message

–2020-02-12 09:10:31-- https://www.kaggle.com/c/planet-understanding-the-amazon-from-space/download-directory/fBesYSh7qE3PuxXtB1SS%2Fversions%2FDMmq3a6XjGpH6e8EUe3c%2Fdirectories%2Ftrain-jpg.tar
Resolving www.kaggle.com (www.kaggle.com)… 35.244.233.98
Connecting to www.kaggle.com (www.kaggle.com)|35.244.233.98|:443… connected.
HTTP request sent, awaiting response… 302 Found
Location: https://www.kaggle.com/account/login?ReturnUrl=%2Fc%2Fplanet-understanding-the-amazon-from-space%2Fdownload-directory%2FfBesYSh7qE3PuxXtB1SS%2Fversions%2FDMmq3a6XjGpH6e8EUe3c%2Fdirectories%2Ftrain-jpg.tar [following]
–2020-02-12 09:10:31-- https://www.kaggle.com/account/login?ReturnUrl=%2Fc%2Fplanet-understanding-the-amazon-from-space%2Fdownload-directory%2FfBesYSh7qE3PuxXtB1SS%2Fversions%2FDMmq3a6XjGpH6e8EUe3c%2Fdirectories%2Ftrain-jpg.tar
Reusing existing connection to www.kaggle.com:443.
HTTP request sent, awaiting response… 200 OK
Length: unspecified [text/html]
Saving to: ‘/home/jupyter/.fastai/data/planet/train-jpg.tar.7z’

/home/jupyter/.fast [ <=> ] 8.76K --.-KB/s in 0.009s

2020-02-12 09:10:32 (961 KB/s) - ‘/home/jupyter/.fastai/data/planet/train-jpg.tar.7z’ saved [8973]

Now when I checked the size of my path directory, its only 1.6 MB, which is just the size of the .csv file. So apparently, the .7z folder does not contain any data.

Any suggestions?

Did you search the solution on the forum?
https://forums.fast.ai/search?q=planet%20category%3A20

Following the suggestion of using wget, I used this to download to the expected folder and without any Chrome plugin:

  1. Go to the contest page
  2. Open Chrome Developer Tools (go to the menu > More tools > Developer Tools) and go to the Network tab
  3. On the Kaggle contest page click the “Download All” button in the Download section
  4. Cancel the download, click the “download-all” row in the Developer Tools and look for “cookie” under “Request headers”. Copy all the content of the “cookie” header and replace “PASTE_THE_COOKIE_HERE” in the command below
  5. Get the download link of the file by right clicking the download button for the “train-jpg.tar” file and replace “PASTE_LINK_HERE” in the command below
  6. Paste this whole command in your jupyter notebook and it will download the set to the expected folder
wget -O {path}/train-jpg.tar.7z \
--header="Cookie: PASTE_THE_COOKIE_HERE" \
PASTE_LINK_HERE
3 Likes

hey @methodmatters! i’m a bit of newbie here. how did you upload the file into gcp? i’m trying to figure out how to access the folder ’/home/jupyter/.fastai/data/planet’. Thanks! :slight_smile:

Long story short - SSH into your virtual machine. From that window, you can choose the “cog” in the upper right-hand corner. This will give the option to manually select which file to upload. You can navigate to the file on your computer, and ask it to upload.

This will upload the file into your root directory in GCP. You’ll need to manually copy the files to the directory referenced in the notebook. Nothing hugely complicated here. Again in the SSH window you created earlier, you can navigate to wherever the file is and move it to the correct directory (whichever one is referenced in the course notebooks). The commands are basic linux - e.g. cp for copy, mv for move… A quick google search should get you the basics of how it works…

Thanks, this worked for me. I also needed to install wget in my jupyter notebook - followed these instructions: https://community.paperspace.com/t/wget-missing-from-jupyter/710

Just in case if people ask and use colab. If you search my post on lesson3 on the forum, you will see my upload of the training dataset to my google drive and I shared it.

It’s not necessary to use GCP CLI to upload files.

Have you tried @Jonny5’s way? It works for me. I’m using GCP too.

Grab the cookies.txt with the Chrome extension, then upload cookies.txt to your instance on GCP. You can upload this file with Jupyter Notebook:
Screen Shot 2020-03-19 at 10.27.32 AM

Then, run all the rest of the commands in Jupyter Notebook.

1 Like

Hey @clarisli! Welcome to the community :smiley:

I tried using the method you referred to but there were some challenges downloading the data. I’m not sure what happened to be honest.

I’m still super new to programming so my biggest challenge is getting the data into the right folder.

I ended up uploading all the zipped files into a folder in the jupyterlab notebook and it worked fine. Took me a while to find and assign the folder.

Let me know how you go and if you found an easier way!

Hi @Jonny5 , it worked ! thanks! Could you please explain me why doest it work?

I have uploaded it on g drive. You can find it by searching the forum.

1 Like

Hi @sergiogaitan, Need your help
when we upload cookies.txt file manually from our local machine, it uploaded in content folder…
then, code:- ! wget --load-cookies content/cookies.txt \ {path} \ -O {path} /train-jpg.tar
is correct ?or we want to first move cookies.txt in .fasai/data directory ?

Hi @Jonny5 Thanks, your suggested steps worked :+1:

Yes - I am having massive problems with the download process. I love the tutorials but I must admit I am losing massive amounts of time when I try to get the requisite datafiles into my notebooks. So nothing to add except a feeling of extraordinary frustration!

Have you search on the forum that someone uploaded it the training file for you on drive? Or you want people to upload it on Dropbox for you, so you don’t need to spend time to check the Google drive?

Hi Jitendra , I uploaded cookies.txt in content folder and then used wget command , it worked fine for me.

Thanks. This worked for me