What is the role of the first() function?

I understand that the code below is supposed to tokenize text. But it won’t work without the first() function. Why is that/what does it do?

spacy = WordTokenizer()
toks = first(spacy([txt]))
print(coll_repr(toks, 30))

with first() I get the output:
(#460) [‘Jones’,‘certified’,‘U.S.’,‘Senate’,‘winner’,‘despite’,‘Moore’,‘challenge’,’.’,’(’,‘Reuters’,’)’,’-’,‘Alabama’,‘officials’,‘on’,‘Thursday’,‘certified’,‘Democrat’,‘Doug’,‘Jones’ ect.

without I get:
TypeError: object of type ‘generator’ has no len()

Hi William, by default the WordTokenizer will return a generator so you need to call first (returning the first element of your generator) to be able to display your result. In that very simple case of having one string as input it might not be obvious why a generator is necessary but think that you would normally have a list of strings as input to your WordTokenizer. I prepared a screenshot to better understand.


That is also the reason we do not give txt as input but [txt] because it needs a list as input.
Hope it helps !

Charles

2 Likes

Hi I am also having difficulty understanding this function first(). It comes up in chapter 4 on page 171.

xb, yb = first(dl).

I read what I could find from a google search, but the explanation I found didn’t make any sense. Is this a standard python function? Pytorch? Fastai? I’ve never seen it before.

Some detailed explanation would be helpful.