Lesson 3 - Can't Download Planet Data Images Tar Archive

Long story short - SSH into your virtual machine. From that window, you can choose the “cog” in the upper right-hand corner. This will give the option to manually select which file to upload. You can navigate to the file on your computer, and ask it to upload.

This will upload the file into your root directory in GCP. You’ll need to manually copy the files to the directory referenced in the notebook. Nothing hugely complicated here. Again in the SSH window you created earlier, you can navigate to wherever the file is and move it to the correct directory (whichever one is referenced in the course notebooks). The commands are basic linux - e.g. cp for copy, mv for move… A quick google search should get you the basics of how it works…

Thanks, this worked for me. I also needed to install wget in my jupyter notebook - followed these instructions: https://community.paperspace.com/t/wget-missing-from-jupyter/710

Just in case if people ask and use colab. If you search my post on lesson3 on the forum, you will see my upload of the training dataset to my google drive and I shared it.

It’s not necessary to use GCP CLI to upload files.

Have you tried @Jonny5’s way? It works for me. I’m using GCP too.

Grab the cookies.txt with the Chrome extension, then upload cookies.txt to your instance on GCP. You can upload this file with Jupyter Notebook:
Screen Shot 2020-03-19 at 10.27.32 AM

Then, run all the rest of the commands in Jupyter Notebook.

1 Like

Hey @clarisli! Welcome to the community :smiley:

I tried using the method you referred to but there were some challenges downloading the data. I’m not sure what happened to be honest.

I’m still super new to programming so my biggest challenge is getting the data into the right folder.

I ended up uploading all the zipped files into a folder in the jupyterlab notebook and it worked fine. Took me a while to find and assign the folder.

Let me know how you go and if you found an easier way!

Hi @Jonny5 , it worked ! thanks! Could you please explain me why doest it work?

I have uploaded it on g drive. You can find it by searching the forum.

1 Like

Hi @sergiogaitan, Need your help
when we upload cookies.txt file manually from our local machine, it uploaded in content folder…
then, code:- ! wget --load-cookies content/cookies.txt \ {path} \ -O {path} /train-jpg.tar
is correct ?or we want to first move cookies.txt in .fasai/data directory ?

Hi @Jonny5 Thanks, your suggested steps worked :+1:

Yes - I am having massive problems with the download process. I love the tutorials but I must admit I am losing massive amounts of time when I try to get the requisite datafiles into my notebooks. So nothing to add except a feeling of extraordinary frustration!

Have you search on the forum that someone uploaded it the training file for you on drive? Or you want people to upload it on Dropbox for you, so you don’t need to spend time to check the Google drive?

Hi Jitendra , I uploaded cookies.txt in content folder and then used wget command , it worked fine for me.

Thanks. This worked for me

Hi @PalaashAgrawal Did you manage to solve it? I’m facing the same problem on Colab. I did what @Jonny5 was proposing but I did not get any error though. But still is not working.

@PabloMC I am also getting the same error.

I have not been able to solve the problem, but I am unsure if you may have an additional one due to google drive. Notice that it says

ERROR: no more files in /content/drive/My

I did it via uploading the cookie as @Jonny5 suggested and

url = *The cookie url*
! wget --load-cookies /content/cookies.txt /{url}/ -O {path}/train-jpg.tar.7z

But as I said that does not solve the 7z untaring process.

 ! 7za -bd -y -so x '{path}'/train-jpg.tar.7z | tar xf - -C {path.as_posix()}

and still returns the same error

You can try a simpler solution. Someone uploaded the dataset separately as a zip file. https://www.kaggle.com/nikitarom/planets-dataset
You can get this file using the wget command, or the Kaggle API command directly.
CHeers

7 Likes

It works! Many thanks @PalaashAgrawal
One writes, for example

! kaggle datasets download nikitarom/planets-dataset -p "{path}"
! unzip -q -n '{path}'/planets-dataset.zip -d '{path}' 

And

(path/'planet'/'planet').ls()

will get you what files are in the folder. You may also use that solution @adit007

4 Likes

I found a solution.This applies to any Dataset from Kaggle.So is a permanent solution(Used in Google Colab)

  1. go to the competition page where you wanna download the .tar
  2. press f12 and go to network panel
  3. start the download and cancel it
  4. You will see a request called train-jpg.tar.7z?..
  5. right click -> Copy as Curl(bash)
  6. paste it into notebook and put an ! markin front.
  7. Very important: add --get at the end of the command
    I dont know much bash but i just experimented around.
    Took me 3 hours to find this.Its working smooth
    after that you can use:-
    !p7zip -d train-jpg.tar.7z
    !tar -xvf train-jpg.tar
    this will extract the data to your path
1 Like

Hi, I ran into same error. Here is an approach that worked.

Somehow earlier the file size of train-jpg.tar.7z was showing smaller than that of Kaggle, hence the error of unable to open the file. I changed the method of copying the files over to the remote machine (where I run the notebook).

Steps:

  1. Download the file train-jpg.tar.7z from Kaggle directly.
  2. Copy over to remote machine, using scp command:
    gcloud compute scp ~/Downloads/train-jpg.tar.7z @my-instance:/home/jupyter/.fastai/data/planet/ --zone us-west1-b

If using GCP, you can find the documentation for command 2) at https://cloud.google.com/compute/docs/gcloud-compute#connecting

And this time all the contents must have correctly copied because the command to unpack the data worked. Hope this helps.

1 Like