Error encountered in Lesson4-imdb Part Sentiment

SOLVED! See the update at the end of the message.

it didn’t work for me. It looks like the problem just happens in Windows 10 machines.

Here is what I did step by step. Please see if you can spot anything diferent.

  1. I cloned the repository (master branch),
  2. entered in a cmd prompt and activated the fastai env
  3. ran python setup.py install --force
  4. started the notebook
  5. ran the first 2 cells and the Sentimet section.
  6. At the second step of the sentiment section, I got the same error:
UnicodeDecodeError                        Traceback (most recent call last)
<ipython-input-7-30850761a448> in <module>()
      1 IMDB_LABEL = data.Field(sequential=False)
----> 2 splits = torchtext.datasets.IMDB.splits(TEXT, IMDB_LABEL, 'data/')

D:\Anaconda3\envs\fastai\lib\site-packages\torchtext\datasets\imdb.py in splits(cls, text_field, label_field, root, train, test, **kwargs)
     52         return super(IMDB, cls).splits(
     53             root=root, text_field=text_field, label_field=label_field,
---> 54             train=train, validation=None, test=test, **kwargs)
     55 
     56     @classmethod

D:\Anaconda3\envs\fastai\lib\site-packages\torchtext\data\dataset.py in splits(cls, path, root, train, validation, test, **kwargs)
     70             path = cls.download(root)
     71         train_data = None if train is None else cls(
---> 72             os.path.join(path, train), **kwargs)
     73         val_data = None if validation is None else cls(
     74             os.path.join(path, validation), **kwargs)

D:\Anaconda3\envs\fastai\lib\site-packages\torchtext\datasets\imdb.py in __init__(self, path, text_field, label_field, **kwargs)
     31             for fname in glob.iglob(os.path.join(path, label, '*.txt')):
     32                 with open(fname, 'r') as f:
---> 33                     text = f.readline()
     34                 examples.append(data.Example.fromlist([text, label], fields))
     35 

D:\Anaconda3\envs\fastai\lib\encodings\cp1252.py in decode(self, input, final)
     21 class IncrementalDecoder(codecs.IncrementalDecoder):
     22     def decode(self, input, final=False):
---> 23         return codecs.charmap_decode(input,self.errors,decoding_table)[0]
     24 
     25 class StreamWriter(Codec,codecs.StreamWriter):

UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 803: character maps to <undefined>

I’m also with autoreload 2. I’m stucked.

UPDATE: I’ve just managed to fix it. It was necessary to uninstall the older version with a pip uninstall torchtext. The older version as installed directly in site-packages and was taking precedence to the newer one installed with python setup.py install. SOLVED!

2 Likes