SOLVED! See the update at the end of the message.
it didn’t work for me. It looks like the problem just happens in Windows 10 machines.
Here is what I did step by step. Please see if you can spot anything diferent.
- I cloned the repository (master branch),
- entered in a cmd prompt and activated the fastai env
- ran
python setup.py install --force
- started the notebook
- ran the first 2 cells and the Sentimet section.
- At the second step of the sentiment section, I got the same error:
UnicodeDecodeError Traceback (most recent call last)
<ipython-input-7-30850761a448> in <module>()
1 IMDB_LABEL = data.Field(sequential=False)
----> 2 splits = torchtext.datasets.IMDB.splits(TEXT, IMDB_LABEL, 'data/')
D:\Anaconda3\envs\fastai\lib\site-packages\torchtext\datasets\imdb.py in splits(cls, text_field, label_field, root, train, test, **kwargs)
52 return super(IMDB, cls).splits(
53 root=root, text_field=text_field, label_field=label_field,
---> 54 train=train, validation=None, test=test, **kwargs)
55
56 @classmethod
D:\Anaconda3\envs\fastai\lib\site-packages\torchtext\data\dataset.py in splits(cls, path, root, train, validation, test, **kwargs)
70 path = cls.download(root)
71 train_data = None if train is None else cls(
---> 72 os.path.join(path, train), **kwargs)
73 val_data = None if validation is None else cls(
74 os.path.join(path, validation), **kwargs)
D:\Anaconda3\envs\fastai\lib\site-packages\torchtext\datasets\imdb.py in __init__(self, path, text_field, label_field, **kwargs)
31 for fname in glob.iglob(os.path.join(path, label, '*.txt')):
32 with open(fname, 'r') as f:
---> 33 text = f.readline()
34 examples.append(data.Example.fromlist([text, label], fields))
35
D:\Anaconda3\envs\fastai\lib\encodings\cp1252.py in decode(self, input, final)
21 class IncrementalDecoder(codecs.IncrementalDecoder):
22 def decode(self, input, final=False):
---> 23 return codecs.charmap_decode(input,self.errors,decoding_table)[0]
24
25 class StreamWriter(Codec,codecs.StreamWriter):
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 803: character maps to <undefined>
I’m also with autoreload 2. I’m stucked.
UPDATE: I’ve just managed to fix it. It was necessary to uninstall the older version with a pip uninstall torchtext
. The older version as installed directly in site-packages and was taking precedence to the newer one installed with python setup.py install
. SOLVED!