I’m searching about tar files (because of meaning of unTAR_data()), and can’t seem to understand why everything revolves around TAR files? Intuitive explanation is well suited!
It is from UNIX. It probably meant tape archive and retrieval. Basically in UNIX when you want to make a single file from a collection of files and retain the hierarchy structure information you used the tar command to produce the single file. ZIP came much later in the Windows world.
In fastai library, the download_data gives you a
pathlib.PosixPath file, not the exact file, you need to use another unzipping library to extract the data.
If you just need the MNIST data from fast ai, here’s an easier way:
from fastai import datasets import gzip, pickle MNIST_URL='http://deeplearning.net/data/mnist/mnist.pkl' path = datasets.download_data(MNIST_URL, ext='.gz') with gzip.open(path, 'rb') as f: ((x_train, y_train), (x_valid, y_valid), _) = pickle.load(f, encoding='latin-1')