Difficuly Downloading Data

TL;DR: I am consistently having difficulty downloading data in the notebooks. If there is a zipped url that I want to download data from, how do I do it in fastai. untar_data() never seems to work. What is the best way to unzip after using download_data()?

Details:

In the MNIST portion of lesson 5, when I try to execute the code in the notebook, I get the following error:

It seems that mnist is not in the .fastai directory (confirmed in the terminal).

I tried to use untar_data() with the url provided in the notebook. As with other times I have tried to use untar_data(), it downloads the data, adds a tgz extension and then tells me that it is not a zip file. untar_data seems to work only with the URLs class in fastai, which is what other posts in the forum say.

I managed to eventually get the data downloaded with download_data by setting parameter ext = ‘’. (It took me a while to figure this out, and I’m not sure why download_data() adds .tgz extension by default.)

I then used the code in the notebook, but modified the path, and I got it to work.


This was all very clunky and inefficient. Downloading data must be one of the most common things anyone does in fastai. I feel like there must be a standard or simple way of doing this. How do I download and unzip data efficiently?

2 Likes

I struggle the same way as you and especially, I don’t know how to download data files to a repository that a cloud notebook (one panel, AWS, etc.) can access. Seems like we need some instruction on this…