Name 'spacy_tok' is not defined


(Jorge Barrios) #1

In lesson 4, tokenization fails in this line:

' '.join(spacy_tok(review[0]))

with the error:

name 'spacy_tok' is not defined

How can we solve this?

Conda environment and fast.ai library are up to date.

Thanks


(Bharadwaj Srigiriraju) #2

I remember fixing this… can you do a git pull from origin once again?

That line should be more like:
' '.join([sent.string.strip() for sent in spacy_tok(review[0])])

Also remember to download english models in spacy if this is the first time you’re running it.


(Allie Crevier) #3

To download the english model in spacy:

python -m spacy download en

After doing this, spacy_tok = spacy.load('en') will load the model and spacy_tok will be defined.


(gram) #4

A minor question about whether the way I rewrote your code? Is it more the Zen of Python? (I’m new at Python).

' '.join([str(i) for i in spacy_tok(review[0])])  

A bigger question is what do I do for the next cell that uses spacy_tok again in the next cell, here…

TEXT = data.Field(lower=True, tokenize=spacy_tok)

(Bharadwaj Srigiriraju) #5

Yup you can! str method seems to be returning correct string representation from doc type. Another way to write same thing would be using map like: ' '.join(map(str, spacy_tok(review[0]))).

This has been updated in the notebook, check the latest version… it has to be
TEXT = data.Field(lower=True, tokenize="spacy")


(gram) #6

Can confirm

I just followed the instructions this chap left in this comment (below) and now it works.
Question: Do I need to clone the repo into a new fastai directory and run the the setup.py install every time I need to update the repo? I suppose I want to keep my modifications to certain notebooks, but also want the latest code to be up to date with modules, so maybe a whole new directory makes sense for me. Just wondering if this is the way it’s done in general.


Lesson 4: Error running LanguageModelData.from_text_files
(Bharadwaj Srigiriraju) #7

git pull is the way to update the repo. Jeremy mentions this in videos, must’ve been mentioned somewhere in forums too. For making own changes, think everyone has their own way.

For me, I do:
git add . && git stash - to stash all my changes.
and then git pull - to update notebooks and library
and then apply them back git stash apply - to write my changes back (usually there won’t be any conflicts)

and continue working on latest repo.

This is not the best approach, but just a quick and dirty way to pull updates and go back to working on my changes. You can google more about those commands if you want to approach the same way. Cloning to separate folder and shifting between two repos (and their updates/ bugs) is an unnecessary pain I specifically wanted to avoid.


(gram) #8

I do remember “git pull” now.
I don’t know where the information goes when I watch these videos. I’ve watched the first six and some repeatedly and retained pretty much nothing.
I managed to change some directories into my own images in some cells in notebook 1. I tried for a few days to get it to label the classifications it makes with the file locations of the images. I dug through the notes and forums about how to do this but that still didn’t work. If you happen to know how to do this I’d appreciate an answer on that.
So yeah, all I can really do after watching these lessons and looking at the notebooks is hit shift+return.