Not a gzipped file

I am trying my first learning exercise (image classification). I have images in a tar gzipped file on my google drive and have copied and pasted the code from lesson 1 (changing it to use a url to my gdrive file) but I am getting the following error:
OSError: Not a gzipped file (b’<!’)

I generated the gzip file in the Ubuntu bash on windows using command:
tar -zcvf images_30d_400.tar.gz images_30d_400

I am using Colab - and the original code from lesson 1 worked fine so I know the environment works with fast.ai.

As suggested in another post I have verified the file with gzip -v -t

Thanks in advance
Tim

1 Like

I have tried a stripped-down example to see if I can get the fast.ai routine untar_data to work…

  1. First create a text file with
    echo test this > test.txt

  2. Then tar and gzip it with
    tar -zcvf test.tar.gz test.txt

  3. Copied the file to Google Drive

  4. Get shared-link to file from Google Drive

  5. Paste this shared link to code from Lesson 1 eg
    datapath = ‘https://drive.google.com/open?id=1dZYXssF-w3HWUQAVg6_xQiikO_xzXC-t
    path = untar_data(datapath); path

  6. Ran this but still get error: OSError: Not a gzipped file (b’<!’)
    (Line 1647 of /usr/lib/python3.6/tarfile.py)

I have tried this both from the Ubuntu bash shell on Windows 10 and also from Debian. This makes me wonder if it’s something to do with the google drive url (although it successfully opens the file from a browser)

Any suggestions?

1 Like

I ran into the same issue. It turns out the file you place on a URL (i.e. the file you upload to Google Drive or Amazon S3) needs to have a .tgz extension instead of .tar.gz.

Rename: my-archive.tar.gz to my-archive.tgz

Then, you also need to remove (omit) the extension in the code like so…

data_url = ‘https://my-bucket.s3.amazonaws.com/fastai/datasets/my-archive
path = untar_data(url=data_url)

I’m not sure why this isn’t documented.

Im recieving the same error.
I tried importing the CIFAR10 dataset from the fastai aws bucket by using the link in the untar_data() function. It seems to give me the same message.

cifar_data=untar_data(‘https://s3.amazonaws.com/fast-ai-imageclas/cifar10.tgz’);

OSError: Not a gzipped file (b’<?’)

During handling of the above exception, another exception occurred:

ReadError Traceback (most recent call last)
/usr/lib/python3.6/tarfile.py in gzopen(cls, name, mode, fileobj, compresslevel, **kwargs)
1645 fileobj.close()
1646 if mode == ‘r’:
-> 1647 raise ReadError(“not a gzip file”)
1648 raise
1649 except:

ReadError: not a gzip file

Hi all, has this issue been resolved?
I’ve just started working with fastai v4 and get the “BadGzipFile” error.

Created the tgz file on my laptop, uploaded to public github repo. Can download and extract the tgz using the URL: “https://github.com/shraddhapai/shraddhapai.github.io/raw/master/fastai_data/jazz_goddesses.tgz

This is my code:

from fastai.vision.all import *
infile = 'https://github.com/shraddhapai/shraddhapai.github.io/raw/master/fastai_data/jazz_goddesses.tgz'
path = untar_data(url=infile)/'images'

This is the error message I get (long, below this message).

Any help would be appreciated - thanks.
Shraddha

    BadGzipFile                               Traceback (most recent call last)
    /opt/conda/envs/fastai/lib/python3.8/tarfile.py in gzopen(cls, name, mode, fileobj, compresslevel, **kwargs)
   1671         try:
-> 1672             t = cls.taropen(name, mode, fileobj, **kwargs)
   1673         except OSError:

/opt/conda/envs/fastai/lib/python3.8/tarfile.py in taropen(cls, name, mode, fileobj, **kwargs)
   1648             raise ValueError("mode must be 'r', 'a', 'w' or 'x'")
-> 1649         return cls(name, mode, fileobj, **kwargs)
   1650 

/opt/conda/envs/fastai/lib/python3.8/tarfile.py in __init__(self, name, mode, fileobj, format, tarinfo, dereference, ignore_zeros, encoding, errors, pax_headers, debug, errorlevel, copybufsize)
   1511                 self.firstmember = None
-> 1512                 self.firstmember = self.next()
   1513 

/opt/conda/envs/fastai/lib/python3.8/tarfile.py in next(self)
   2312             try:
-> 2313                 tarinfo = self.tarinfo.fromtarfile(self)
   2314             except EOFHeaderError as e:

/opt/conda/envs/fastai/lib/python3.8/tarfile.py in fromtarfile(cls, tarfile)
   1101         """
-> 1102         buf = tarfile.fileobj.read(BLOCKSIZE)
   1103         obj = cls.frombuf(buf, tarfile.encoding, tarfile.errors)

/opt/conda/envs/fastai/lib/python3.8/gzip.py in read(self, size)
    291             raise OSError(errno.EBADF, "read() on write-only GzipFile object")
--> 292         return self._buffer.read(size)
    293 

/opt/conda/envs/fastai/lib/python3.8/_compression.py in readinto(self, b)
     67         with memoryview(b) as view, view.cast("B") as byte_view:
---> 68             data = self.read(len(byte_view))
     69             byte_view[:len(data)] = data

/opt/conda/envs/fastai/lib/python3.8/gzip.py in read(self, size)
    478                 self._init_read()
--> 479                 if not self._read_gzip_header():
    480                     self._size = self._pos

/opt/conda/envs/fastai/lib/python3.8/gzip.py in _read_gzip_header(self)
    426         if magic != b'\037\213':
--> 427             raise BadGzipFile('Not a gzipped file (%r)' % magic)
    428 

BadGzipFile: Not a gzipped file (b'<!')

During handling of the above exception, another exception occurred:

ReadError                                 Traceback (most recent call last)
<ipython-input-4-581bd3e57bb5> in <module>
      1 from fastai.vision.all import *
      2 infile = 'https://github.com/shraddhapai/shraddhapai.github.io/raw/master/fastai_data/jazz_goddesses.tgz'
----> 3 path = untar_data(url=infile)/'images'

/opt/conda/envs/fastai/lib/python3.8/site-packages/fastai/data/external.py in untar_data(url, fname, dest, c_key, force_download, extract_func)
    257         if _get_check(url) and _check_file(fname) != _get_check(url):
    258             print(f"File downloaded is broken. Remove {fname} and try again.")
--> 259         extract_func(fname, dest.parent)
    260         rename_extracted(dest)
    261     return dest

/opt/conda/envs/fastai/lib/python3.8/site-packages/fastai/data/external.py in file_extract(fname, dest)
    217     if dest is None: dest = Path(fname).parent
    218     fname = str(fname)
--> 219     if   fname.endswith('gz'):  tarfile.open(fname, 'r:gz').extractall(dest)
    220     elif fname.endswith('zip'): zipfile.ZipFile(fname     ).extractall(dest)
    221     else: raise Exception(f'Unrecognized archive: {fname}')

/opt/conda/envs/fastai/lib/python3.8/tarfile.py in open(cls, name, mode, fileobj, bufsize, **kwargs)
   1617             else:
   1618                 raise CompressionError("unknown compression type %r" % comptype)
-> 1619             return func(name, filemode, fileobj, **kwargs)
   1620 
   1621         elif "|" in mode:

/opt/conda/envs/fastai/lib/python3.8/tarfile.py in gzopen(cls, name, mode, fileobj, compresslevel, **kwargs)
   1674             fileobj.close()
   1675             if mode == 'r':
-> 1676                 raise ReadError("not a gzip file")
   1677             raise
   1678         except:

ReadError: not a gzip file