How to download a dataset to PaperSpace

Zora · October 22, 2020, 2:24pm

Hi Guys,

I am learning practical deep learning for coders course. I am new to the field, and I am stuck on how to call or download my dataset for practice.

I am using a Gradient notebook in Paperspace. I uploaded my data to PaperSpace, and I followed all the steps explained in PaperSpace tutorials here DeepDesign2019/paperspace_tutorials/Paperspace_uploadingdata.md at master · alexacarlson/DeepDesign2019 · GitHub

Now, when I try to download my dataset, I am wondering how I have to do this part. I used the fastai documentation (https://docs.fast.ai/data.external
https://docs.fast.ai/tutorial.vision). However, none of those commands seem working in my case. I am always getting this error that there is not a directory with this name or length of files giving me zero all the time. Can you please help me which command I have to use to download my dataset when I uploaded a dataset to PaperSpace?

Also, I am always confused about using those datasets that are available by default inside the fastai library. I know that we used it in the first lesson of the course, but still, I think the name of the dataset (the list of the name: https://docs.fast.ai/data.external) I am using with class URLs not working all the time(error: there is not a directory with this name). Can anyone help me what my mistake is?

P.S. Besides, I read the following topics from the forum, however, I think I need more explanations here.

Thank you,

vivekharshey1 · April 27, 2021, 1:00pm

I was faced with a problem to download a dataset in the virtual machine. These are some of the ways-

First of all, we can use the untar_data method of fastai for download as well as untaring the data. It works well for the standard datasets used in the fastai course which are stored in the cloud in gzip format. However, this method cannot be used to download the file from google drive. (At least I couldn’t)
There is code available in python which can extract any zip file, I have tested it.
https://stackoverflow.com/questions/3451111/unzipping-files-in-python
So, the problem boils down to downloading the datasets. We can use wget command to download the dataset. It works very fast, and I have downloaded the BACH test dataset which is of 3GB in less than 5 min. https://zenodo.org/record/3632035/files/ICIAR2018_BACH_Challenge_TestDataset.zip
But this does not work for Google drive shared link directly.
To download from Google drive, we can use the following bash script which uses the curl command-
(NB: Just select this code, copy it and on the terminal paste Shift+Ctrl+V, or right-click and paste option.)

#!/bin/bash
fileid=“file id of Google drive shared file”
filename=“Write your filename”
curl -c ./cookie -s -L “https://drive.google.com/uc?export=download&id=${fileid}” > /dev/null
curl -Lb ./cookie "https://drive.google.com/uc?export=download&confirm=awk '/download/ {print $NF}' ./cookie&id={fileid}" -o {filename}

You must replace the file id with the google drive id of the file and the filename is the name of the file which u want to give the file in double-quotes. It must be made sure that the file is shared publicly (must have edit permission) then only it works. I have tested it and it works fine.

After downloading the zip file, you can unzip it with the tar command or method in step 2.

The command wget can also be used to download the Google drive file.
(Files > 100 Mb are large files) Also change docs.google to drive.google

For large files run the following command with necessary changes in FILEID and FILENAME:
wget --load-cookies /tmp/cookies.txt “https://docs.google.com/uc?export=download&confirm=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate ‘https://docs.google.com/uc?export=download&id=FILEID’ -O- | sed -rn ‘s/.confirm=([0-9A-Za-z_]+)./\1\n/p’)&id=FILEID” -O FILENAME && rm -rf /tmp/cookies.txt

So, for small file run following command on your terminal:
wget --no-check-certificate ‘https://docs.google.com/uc?export=download&id=FILEID’ -O FILENAME

https://bcrf.biochem.wisc.edu/2021/02/05/download-google-drive-files-using-wget/
It works, I have tested it.

In addition, to Google drive you can also use wget to download a file from any link
wget https://zenodo.org/record/3632035/files/ICIAR2018_BACH_Challenge_TestDataset.zip -O ‘ICIAR2018_BACH_Challenge_TestDataset.zip’
Also, you can download using curl command also.
curl -o ICIAR2018_BACH_Challenge_TestDataset.zip https://zenodo.org/record/3632035/files/ICIAR2018_BACH_Challenge_TestDataset.zip
(example code for BACH dataset)
https://www.howtogeek.com/447033/how-to-use-curl-to-download-files-from-the-linux-command-line/

Zora · April 27, 2021, 2:00pm

@vivekharshey1

Thank you very much!

vivekharshey1 · April 27, 2021, 2:57pm

Glad to help.