Lesson 3 - Can't Download Planet Data Images Tar Archive

methodmatters · March 4, 2020, 9:24am

Long story short - SSH into your virtual machine. From that window, you can choose the “cog” in the upper right-hand corner. This will give the option to manually select which file to upload. You can navigate to the file on your computer, and ask it to upload.

This will upload the file into your root directory in GCP. You’ll need to manually copy the files to the directory referenced in the notebook. Nothing hugely complicated here. Again in the SSH window you created earlier, you can navigate to wherever the file is and move it to the correct directory (whichever one is referenced in the course notebooks). The commands are basic linux - e.g. cp for copy, mv for move… A quick google search should get you the basics of how it works…

brettmoreton · March 5, 2020, 11:39am

Thanks, this worked for me. I also needed to install wget in my jupyter notebook - followed these instructions: https://community.paperspace.com/t/wget-missing-from-jupyter/710

JonathanSum · March 6, 2020, 1:41pm

Just in case if people ask and use colab. If you search my post on lesson3 on the forum, you will see my upload of the training dataset to my google drive and I shared it.

clarisli · March 19, 2020, 2:35am

It’s not necessary to use GCP CLI to upload files.

Have you tried @Jonny5’s way? It works for me. I’m using GCP too.

Grab the cookies.txt with the Chrome extension, then upload cookies.txt to your instance on GCP. You can upload this file with Jupyter Notebook:
Screen Shot 2020-03-19 at 10.27.32 AM

Then, run all the rest of the commands in Jupyter Notebook.

aditya.swami · March 20, 2020, 9:29pm

Hey @clarisli! Welcome to the community

I tried using the method you referred to but there were some challenges downloading the data. I’m not sure what happened to be honest.

I’m still super new to programming so my biggest challenge is getting the data into the right folder.

I ended up uploading all the zipped files into a folder in the jupyterlab notebook and it worked fine. Took me a while to find and assign the folder.

Let me know how you go and if you found an easier way!

sergiogaitan · April 11, 2020, 3:40am

Hi @Jonny5 , it worked ! thanks! Could you please explain me why doest it work?

JonathanSum · April 11, 2020, 4:54am

I have uploaded it on g drive. You can find it by searching the forum.

jitu · April 12, 2020, 4:58am

Hi @sergiogaitan, Need your help
when we upload cookies.txt file manually from our local machine, it uploaded in content folder…
then, code:- ! wget --load-cookies content/cookies.txt \ {path} \ -O {path} /train-jpg.tar
is correct ?or we want to first move cookies.txt in .fasai/data directory ?

jitu · April 13, 2020, 3:21am

Hi @Jonny5 Thanks, your suggested steps worked

bcollins1974 · April 16, 2020, 1:14pm

Yes - I am having massive problems with the download process. I love the tutorials but I must admit I am losing massive amounts of time when I try to get the requisite datafiles into my notebooks. So nothing to add except a feeling of extraordinary frustration!

JonathanSum · April 17, 2020, 1:01am

Have you search on the forum that someone uploaded it the training file for you on drive? Or you want people to upload it on Dropbox for you, so you don’t need to spend time to check the Google drive?

sergiogaitan · April 25, 2020, 1:17am

Hi Jitendra , I uploaded cookies.txt in content folder and then used wget command , it worked fine for me.

Anush · April 25, 2020, 5:45am

Thanks. This worked for me

PabloMC · April 26, 2020, 3:30pm

Hi @PalaashAgrawal Did you manage to solve it? I’m facing the same problem on Colab. I did what @Jonny5 was proposing but I did not get any error though. But still is not working.

adit007 · April 26, 2020, 3:50pm

@PabloMC I am also getting the same error.

PabloMC · April 26, 2020, 4:00pm

I have not been able to solve the problem, but I am unsure if you may have an additional one due to google drive. Notice that it says

ERROR: no more files in /content/drive/My

I did it via uploading the cookie as @Jonny5 suggested and

url = *The cookie url*
! wget --load-cookies /content/cookies.txt /{url}/ -O {path}/train-jpg.tar.7z

But as I said that does not solve the 7z untaring process.

 ! 7za -bd -y -so x '{path}'/train-jpg.tar.7z | tar xf - -C {path.as_posix()}

and still returns the same error

PalaashAgrawal · April 26, 2020, 4:11pm

You can try a simpler solution. Someone uploaded the dataset separately as a zip file. https://www.kaggle.com/nikitarom/planets-dataset
You can get this file using the wget command, or the Kaggle API command directly.
CHeers

PabloMC · April 26, 2020, 5:50pm

It works! Many thanks @PalaashAgrawal
One writes, for example

! kaggle datasets download nikitarom/planets-dataset -p "{path}"
! unzip -q -n '{path}'/planets-dataset.zip -d '{path}'

And

(path/'planet'/'planet').ls()

will get you what files are in the folder. You may also use that solution @adit007

souvikta · May 14, 2020, 4:40am

I found a solution.This applies to any Dataset from Kaggle.So is a permanent solution(Used in Google Colab)

go to the competition page where you wanna download the .tar
press f12 and go to network panel
start the download and cancel it
You will see a request called train-jpg.tar.7z?..
right click -> Copy as Curl(bash)
paste it into notebook and put an ! markin front.
Very important: add --get at the end of the command
I dont know much bash but i just experimented around.
Took me 3 hours to find this.Its working smooth
after that you can use:-
!p7zip -d train-jpg.tar.7z
!tar -xvf train-jpg.tar
this will extract the data to your path

ritan1023 · May 23, 2020, 11:46pm

Hi, I ran into same error. Here is an approach that worked.

Somehow earlier the file size of train-jpg.tar.7z was showing smaller than that of Kaggle, hence the error of unable to open the file. I changed the method of copying the files over to the remote machine (where I run the notebook).

Steps:

Download the file train-jpg.tar.7z from Kaggle directly.
Copy over to remote machine, using scp command:
gcloud compute scp ~/Downloads/train-jpg.tar.7z @my-instance:/home/jupyter/.fastai/data/planet/ --zone us-west1-b

If using GCP, you can find the documentation for command 2) at https://cloud.google.com/compute/docs/gcloud-compute#connecting

And this time all the contents must have correctly copied because the command to unpack the data worked. Hope this helps.