I understand that the code below is supposed to tokenize text. But it won’t work without the first() function. Why is that/what does it do?
spacy = WordTokenizer()
toks = first(spacy([txt]))
print(coll_repr(toks, 30))
with first() I get the output:
(#460) [‘Jones’,‘certified’,‘U.S.’,‘Senate’,‘winner’,‘despite’,‘Moore’,‘challenge’,’.’,’(’,‘Reuters’,’)’,’-’,‘Alabama’,‘officials’,‘on’,‘Thursday’,‘certified’,‘Democrat’,‘Doug’,‘Jones’ ect.
without I get:
TypeError: object of type ‘generator’ has no len()