So, I was working with week 1 notebook and I have a question. How can I download a dataset and work on it? In my case, I want to download the Devanagari dataset which is like the MNIST dataset but for Hindi alphabets.
So, I want to know how can I download it and start working on it. I have tried the untar_data() and download_data() functions but it’s not working for me. Any help would be appreciated.
I have managed to download the dataset with download_data() function. It has downloaded a .zip.tgz file and now I don’t know how to decompress it into files and folders. Any ideas?
problem is this dataset is not a good pick right after the first lesson. They don’t even provide labels at least I couldn’t find them. Going to create my own little toy data set now.
I was faced with a problem to download a dataset in the virtual machine. These are some of the ways-
First of all, we can use the untar_data method of fastai for download as well as untaring the data. It works well for the standard datasets used in the fastai course which are stored in the cloud in gzip format. However, this method cannot be used to download the file from google drive. (At least I couldn’t)
So, the problem boils down to downloading the datasets. We can use wget command to download the dataset. It works very fast, and I have downloaded the BACH test dataset which is of 3GB in less than 5 min. https://zenodo.org/record/3632035/files/ICIAR2018_BACH_Challenge_TestDataset.zip
But this does not work for Google drive shared link directly.
To download from Google drive, we can use the following bash script which uses the curl command-
(NB: Just select this code, copy it and on the terminal paste Shift+Ctrl+V, or right-click and paste option.)
You must replace the file id with the google drive id of the file and the filename is the name of the file which u want to give the file in double-quotes. It must be made sure that the file is shared publicly (must have edit permission) then only it works. I have tested it and it works fine.
After downloading the zip file, you can unzip it with the tar command or method in step 2.
The command wget can also be used to download the Google drive file.
(Files > 100 Mb are large files) Also change docs.google to drive.google
For large files run the following command with necessary changes in FILEID and FILENAME:
wget --load-cookies /tmp/cookies.txt “https://docs.google.com/uc?export=download&confirm=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate ‘https://docs.google.com/uc?export=download&id=FILEID’ -O- | sed -rn ‘s/.confirm=([0-9A-Za-z_]+)./\1\n/p’)&id=FILEID” -O FILENAME && rm -rf /tmp/cookies.txt