Crestle - Spacy installation failed

Hi @anurag
Spacy is not able to download the “en” module. Can you please help me there. It needs the admin permission

Based on, I’d recommend installing spacy in a virtualenv directory.

Thanks @anurag. But unfortunately not able to do the same in crestle . I am not getting a error saying pip installer not available .

May be I am doing something wrong . Let me give a try and get back to you

Try: pip install spacy && python -m spacy download en
on it showed instructions on installing it.

I did that . Unfortunately it needs a root permission to do symlink which I dont have with Crestle

Are you able to do a conda install spacy?

I haven’t used crestle so I’m not sure how the environment is set up

I am getting a error message as Conda command is not availble . So I am struck there as well .

Me too, I’ve read that we should be able to fix it by calling spacy link (, but it’s not clear to me what we’re supposed to link or where it’s supposed to go :frowning:

BTW don’t forget on Crestle to use pip3, not pip.

Also, I don’t think you need the full en spacy module - I think the tokenizer might work with no additional installation steps…

For now, I’d recommend trying manual download:

I was able to get space to load by changing the function to point to a manually downloaded copy:

spacy_en = spacy.load(’~/courses/fastai2/courses/dl1/data/aclImdb/en_core_web_md-2.0.0’

but now I’m getting an error because it’s trying to read the data in ascii, but that’s not what I’ve downloaded. I’m looking for a way to convert it over to UTF, any thoughts appreciated :slight_smile:

UnicodeDecodeError Traceback (most recent call last)
in ()
1 FILES = dict(train=TRN_PATH, validation=VAL_PATH, test=VAL_PATH)
----> 2 md = LanguageModelData(PATH, TEXT, **FILES, bs=bs, bptt=bptt, min_freq=10)

~/courses/fastai2/courses/dl1/fastai/ in init(self, path, field, train, validation, test, bs, bptt, **kwargs)
193 self.trn_ds,self.val_ds,self.test_ds = ConcatTextDataset.splits(
194 path, text_field=field, train=train, validation=validation, test=test)
–> 195 field.build_vocab(self.trn_ds, **kwargs)
196 self.pad_idx = field.vocab.stoi[field.pad_token]
197 self.nt = len(field.vocab)

/usr/local/lib/python3.6/dist-packages/torchtext/data/ in splits(cls, path, root, train, validation, test, **kwargs)
67 path =
68 train_data = None if train is None else cls(
—> 69 os.path.join(path, train), **kwargs)
70 val_data = None if validation is None else cls(
71 os.path.join(path, validation), **kwargs)

~/courses/fastai2/courses/dl1/fastai/ in init(self, path, text_field, newline_eos, **kwargs)
182 for p in paths:
183 for line in open§: text += text_field.preprocess(line)
–> 184 if newline_eos: text.append(’’)
186 examples = [[text], fields)]

/usr/lib/python3.6/encodings/ in decode(self, input, final)
24 class IncrementalDecoder(codecs.IncrementalDecoder):
25 def decode(self, input, final=False):
—> 26 return codecs.ascii_decode(input, self.errors)[0]
28 class StreamWriter(Codec,codecs.StreamWriter):

UnicodeDecodeError: ‘ascii’ codec can’t decode byte 0xc3 in position 3680: ordinal not in range(128)

Someone else got that error too in this thread,

It worked when I ran the code on my home machine.

This is a common problem. IMDB dataset text is already Unicode(utf-8).
It’s just that your machine is trying to use the ascii decoder, which won’t work.
When you open the file, you can explicitly specify the encoding to use.

I wonder why any system would use ascii by default. As far as I can tell from Googling, UTF-8 is the default for Python 3.6

It’s torch text I guess.

Based on that, it looks like the root cause is something in the environment since multiple (Crestle?) people are getting the error.

@anurag is the environment set up to use UTF-8 by default, as in the link Arvind mentions?

Turns out it isn’t. I’ll deploy an updated environment later today and post here.


Thanks @anurag for quickly attending to all issues! We are thankful for the wonderful service you provide.


All new notebooks will now use en.UTF-8 as the default.


Up and running, thanks anurag.

