AttributeError: 'function' object has no attribute 'process_all'

I am trying to implement the lang_model-arxiv.ipynb. using the new version of the fastai library. I am encountring the following error:

data = TextLMDataBunch.from_folder(f’{PATH}all/’,**FILES,tokenizer=my_spacy_tok, min_freq=10)

AttributeError Traceback (most recent call last)
in ()
----> 1 data = TextLMDataBunch.from_folder(f’{PATH}all/’,**FILES,tokenizer=my_spacy_tok, min_freq=10)

/usr/local/lib/python3.6/dist-packages/fastai/text/ in from_folder(cls, path, train, valid, test, classes, tokenizer, vocab, **kwargs)
190 src = (TextList.from_folder(path, processor=processor)
191 .split_by_folder(train=train, valid=valid))
–> 192 src = src.label_for_lm() if cls==TextLMDataBunch else src.label_from_folder(classes=classes)
193 if test is not None: src.add_test_folder(path/test)
194 return src.databunch(**kwargs)

/usr/local/lib/python3.6/dist-packages/fastai/ in _inner(*args, **kwargs)
418 self.valid = fv(*args, **kwargs)
419 self.class = LabelLists
–> 420 self.process()
421 return self
422 return _inner

/usr/local/lib/python3.6/dist-packages/fastai/ in process(self)
465 “Process the inner datasets.”
466 xp,yp = self.get_processors()
–> 467 for i,ds in enumerate(self.lists): ds.process(xp, yp, filter_missing_y=i==0)
468 return self

/usr/local/lib/python3.6/dist-packages/fastai/ in process(self, xp, yp, filter_missing_y)
594 filt = array([o is None for o in self.y])
595 if filt.sum()>0: self.x,self.y = self.x[~filt],self.y[~filt]
–> 596 self.x.process(xp)
597 return self

/usr/local/lib/python3.6/dist-packages/fastai/ in process(self, processor)
66 if processor is not None: self.processor = processor
67 self.processor = listify(self.processor)
—> 68 for p in self.processor: p.process(self)
69 return self

/usr/local/lib/python3.6/dist-packages/fastai/text/ in process(self, ds)
239 tokens = []
240 for i in progress_bar(range(0,len(ds),self.chunksize), leave=False):
–> 241 tokens += self.tokenizer.process_all(ds.items[i:i+self.chunksize])
242 ds.items = tokens

AttributeError: ‘function’ object has no attribute ‘process_all’

My complete notebook can be found here.

Now I am defining my tokenizer as:

w = ['<SUMM>','<CAT>','<TITLE>','<BR />','<BR>']

I hope this fixes the problem (this step takes ages to run on my platform )

Your tokenizer should be a fastai Tokenizer.


Hi Sylvain,
I’m trying to do something similar, to modify parameters on SpacyTokenizer, in the fastai v1 library, using the Data Block API.

As an initial reality-check, I tried to name the SpacyTokenizer in mg Data Block API using the syntax in the Fastai V1 documentation on:

2/3 down that page, in section “The TextList input classes”, the doc says:

Basic ItemList for text data:

vocab contains the correspondence between ids and tokens, pad_idx is the id used for padding. You can pass a custom processor in the kwargs to change the defaults for tokenization or numericalization. It should have the following form:

processor = [TokenizeProcessor(tokenizer=SpacyTokenizer(‘en’)), NumericalizeProcessor(max_vocab=30000)]

I copied this line into my Data Block API for creating the databunch for building a language model:

data_lm4 = (TextList.from_csv(path, ‘notes_gt_lt.csv’, cols=‘note_text’,
processor = [TokenizeProcessor(tokenizer=SpacyTokenizer(‘en’)), NumericalizeProcessor(max_vocab=30000)])
.databunch(bs=5, num_workers=2))

But when I ran it, I got the error

AttributeError: ‘SpacyTokenizer’ object has no attribute ‘process_all’

Can you see an error here? Thank you, Dana

Oh yeah the docs are wrong. You need to create the tokenizer like this:

tokenizer = Tokenizer(SpacyTokenizer, 'en')

then pass it in a processor like this:

processor = [TokenizeProcessor(tokenizer=tokenizer), NumericalizeProcessor(max_vocab=30000)]

I’ll update the docs.

Thank you, Sylvain! It worked!