Textdatabunch gets stuck

santhoshetty · January 1, 2019, 8:00pm

TextDataBunch gets stuck when making language model databunch
It does not proceed further at all. Can anyone please point me to what could be the issue?

santhoshetty · January 1, 2019, 10:18pm

The problem occurs in the labeling for language model part.

wgpubs · January 2, 2019, 12:38am

What happens when label_for_lm() is called?
This is where your training and validation ItemLists are created using your pre-processing function responsible for tokenizing your text, building (or applying) your vocab, and numericalizing your tokens based on that vocab.

Looking at the code, you’ll notice that the progress bar pops up during the tokenization process … and so I’m inclined to believe, that for some reason, the tokenizer is having problems with your training dataset (ItemList).

Some things to try:

Review your .csv file and make sure all looks kosher
What happens if you change your random split pct? For example, try splitting it by 0.5 … what happens then?
And make sure you have the latest version of fastai (I think its 0.39). You can check by including the following in your notebook

print(f'fastai version: {__version__}')