I’m trying to build my own NLP project using a Kaggle dataframe, and for the most part things seem to building out ok. I have run into two errors that I think are related.
First, after I build my datablock:
dls_block = DataBlock( blocks=(TextBlock.from_df('blurb', is_lm=true), CategoryBlock), get_x=ColReader('text'), get_y=ColReader('state'), splitter=RandomSplitter(0.1) ).dataloaders(df, path=path, bs=16, seq_len=80)
I try and show a batch using dls_block.show_batch()
but receive this error:
IndexError Traceback (most recent call last)
<ipython-input-10-e3057ceec2e6> in <module>()
----> 1 dls_block.show_batch()
19 frames
/usr/local/lib/python3.7/dist-packages/fastcore/foundation.py in <listcomp>(.0)
117 return (self.items.iloc[list(i)] if hasattr(self.items,'iloc')
118 else self.items.__array__()[(i,)] if hasattr(self.items,'__array__')
--> 119 else [self.items[i_] for i_ in i])
120
121 def __setitem__(self, idx, o):
IndexError: list index out of range
I can, however, create a language_model_learner with this datablock and successfully call fit_one_cycle. But things go wrong again when I try and call learn.predict with
TEXT = "A new"
N_WORDS = 40
N_SENTENCES = 2
preds = [learn.predict(TEXT, N_WORDS, temperature=0.75)
for _ in range(N_SENTENCES)]
and I get this error message:
ValueError Traceback (most recent call last)
<ipython-input-14-7599bd22d406> in <module>()
3 N_SENTENCES = 2
4 preds = [learn.predict(TEXT, N_WORDS, temperature=0.75)
----> 5 for _ in range(N_SENTENCES)]
1 frames
/usr/local/lib/python3.7/dist-packages/fastai/text/learner.py in predict(self, text, n_words, no_unk, temperature, min_p, no_bar, decoder, only_last_word)
157 self.model.reset()
158 idxs = idxs_all = self.dls.test_dl([text]).items[0].to(self.dls.device)
--> 159 if no_unk: unk_idx = self.dls.vocab.index(UNK)
160 for _ in (range(n_words) if no_bar else progress_bar(range(n_words), leave=False)):
161 with self.no_bar(): preds,_ = self.get_preds(dl=[(idxs[None],)])
ValueError: 'xxunk' is not in list
Any help or advice would be appreciated!
Thanks,
Alex