Why fast.ai uses TAR files?

I’m searching about tar files (because of meaning of unTAR_data()), and can’t seem to understand why everything revolves around TAR files? Intuitive explanation is well suited!

Hi Lobvh

It is from UNIX. It probably meant tape archive and retrieval. Basically in UNIX when you want to make a single file from a collection of files and retain the hierarchy structure information you used the tar command to produce the single file. ZIP came much later in the Windows world.

Regards Conwyn

2 Likes

In fastai library, the download_data gives you a pathlib.PosixPath file, not the exact file, you need to use another unzipping library to extract the data.

If you just need the MNIST data from fast ai, here’s an easier way:

from fastai import datasets
import gzip, pickle
MNIST_URL='http://deeplearning.net/data/mnist/mnist.pkl'
path = datasets.download_data(MNIST_URL, ext='.gz')
with gzip.open(path, 'rb') as f:
    ((x_train, y_train), (x_valid, y_valid), _) = pickle.load(f, encoding='latin-1')