Downloading and Extracting Different Data

I’ve been experimenting with the notebooks and examples from the course, but I’m trying to understand how Jupyter notebooks are finding, downloading, and extracting data.

Take, for example, the Pets dataset. What is “URLs.PETS” and how would I use this method with a dataset I downloaded and had on my desktop (say, another academic dataset?) Or would I pass in the URL from another website that had the dataset, or do you use a completely different method altogether?

I’ve had luck running the existing notebooks and tinkering, but as I lack experience in Python programming wrapping my head around how it’s doing what it does, even for basic things like this, is giving me some trouble. I feel like this is probably a common sense thing to most people here but to me it’s a major hurdle that’s preventing me from experimenting further and trying new things!

Thanks in advance for the help.

Hi there,

Every URL downloads data into a local directory - I won’t tell you where as you can figure this out :slight_smile: It will be a good learning exercise as a starter to figure out your file system.

Once you know where it is, you can create similar structure for your project and modify path to point to the new file structure.

Cheers,
Dev.

I’m sorry, but I’m not following. I see that it’s downloading into a local directory. In the Pets example, I see the file path.

I think, where I’m not fully understanding the connection, is “URLs.PETS.” The help documentation says we have to pass a URL as an argument but I don’t understand where this URL is pointing to. I’ll keep trying but I feel like I’m missing something obvious?

URL points to a tgzip file stored somewhere off-site. Fastai hosts a number of these datasets on their own servers for us to use. If you find anything that is tgzip’d, you can pass that URL into the url for untar_data(url=)

1 Like

The thing is if you already have a dataset on desktop, you can just put the data in the local directories (where URL usually untars/unzips data)… that’s why I asked if you can analyse where data is going.

Once files are there in proper directories, then you can use PosixPath (pah variable in lessons), to point to those directories and pass that to Fast AI functions.

1 Like