Untar_data() help

Hi. I am looking at the docs for untar_data:

At the bottom, it says this:

image

It appears that URLs is a specified class that has various datasets associated with it:

So you cannot create URLs objects, but you must feed URLs.something to use untar_data.

So if you have a .tgz file from an external source, how do you untar it?

Thanks.

URLs.some_url is just a url of type String, and untar_data needs a url to download the data and then extract it. Since you already have the data it’s just one line of code to extract it:

import tarfile
tarfile.open(tar_file_name, 'r:gz').extractall(directory_path)

Thanks for the reply.

If I’m understanding you correctly, I can enter a URL into untar_data() (i.e. as a string) and it will download and extract. The argument does not need to be of the class ‘URLs’. When I tried that previously, though, I received an error “not a zipped file” even though it was a .tgz file. Any idea why this would happen? I used URLs taken from the fastai website.

It sounds like a workaround is to download the data and then use tarfile.open(), as you suggest. But should it be possible to use untar_data to download and unzip? I understood the docs to be suggesting that you needed to use the fastai class “URLs” as the argument.

Appreciate your help!

Can you share the url if possible? I can try on my workstation.

what is the significance of “annotations” folder that gets created when we use untar_data() to download from URL

Just came across this thread while looking for a solution myself.
(This may not be particularly useful since it has been a long time since your question.)

Here’s an example of how one could use untar_data if the dataset comes from an external source. Here we use the download_data helper function from fastai to download the flowers dataset.

url = 'http://download.tensorflow.org/example_images/flower_photos.tgz'
path = download_data(url)
path.as_posix()
# '/home/gg/.fastai/archive/flower_photos.tgz'

data = untar_data(path.as_posix()) # or pass str(path)
data
# Path('/home/gg/.fastai/data/flower_photos')
3 Likes

Thanks. It has been a while but this is still helpful.

Is the .asposix needed because the returned path is in Window’s format?

Oh, then as_posix() is not possibly needed. Just have to be sure untar_data gets a String in that case str(path).