NLP Prediction KeyError "tensor(0)"

HaIDsIEx · February 12, 2020, 1:37am

Hello,

I followed this project to fine-tune an existing german model: GitHub - jfilter/ulmfit-for-german: 👩‍🏫 Pre-trained German Language Model with sub-word tokenization for ULMFIT.
Everything worked so far, but after learning the text_classifier_learner I cann’t predict data. Everytime I call learn.predict(str) I get this error:

What I did so far:
I load the language_model_learner like this (it is a model trained befor 1.0.53 Major new changes and features - #8 by sgugger):

config = awd_lstm_lm_config.copy()
config[‘n_hid’] = 1150
learn_lm = language_model_learner(data_lm, AWD_LSTM, drop_mult=0.5, pretrained=False, config=config)
learn_lm.load(‘./nets/ulmfit_for_german_jfilter’)

I fine-tune it and save it with learn_lm.save_encoder('enc').

Then I load it with the text_classifier_learner:

config = awd_lstm_clas_config.copy()
config[‘n_hid’] = 1150
learn = text_classifier_learner(data_train, AWD_LSTM, drop_mult=0.5, config=config)
learn.load_encoder(‘enc’, device=‘cuda:0’)

and learnd the classifier on existing data.

Everything works fine (accuracy (while learning) looks ok), however I cann’t predict anything with learn.predict("str") because of the already stated error.

Do anybody of you have an idea what the problem could be?

Yours,
Pa

sgugger · February 12, 2020, 3:15am

No one can answer without knowing how you assembled your data. Learn.predict expect it in the same way.

HaIDsIEx · February 12, 2020, 2:30pm

Unfortunately, I do not know what you mean with “assembly”.
I used an existing word-embedding layer like this:

bpemb_de = BPEmb(lang=“de”, vs=25000, dim=300)
itos = dict(enumerate(bpemb_de.words + [‘xxpad’]))
voc = Vocab(itos)
df_valid = df_valid.text.apply(lambda x: bpemb_de.encode_ids_with_bos_eos(clean(x, stp_lang=‘german’)))

But I think I know what you mean… I just wrote this Method for prediction:

def con(x):
return torch.from_numpy(np.array(bpemb_de.encode_ids_with_bos_eos(clean(x, stp_lang=‘german’))))

learn.predict(con(“str”))

But now I get the error:

ValueError: only one element tensors can be converted to Python scalars

Do I get still something wrong?

Basically, I followed the instructions in this notebook.

dkay · October 8, 2020, 12:09am

I did exacly the same and have the same error…

did you solve it ?

dkay · October 8, 2020, 12:24am

I found out what it is.

The itos has to be converted to a list

chris3 · October 13, 2020, 2:34pm

Could someone provide a notebook with a working example?

I also try to implement a german model, but the code on https://github.com/jfilter/ulmfit-for-german
seems very much outdated and is using an old version of FastAi.

Especially, i do not know how to use the following lines:

TextClasDataBunch.from_ids(...)

I think the class should be now TextDataLoaders., but there is no longer a ‘from_ids’ method.

Vocab(itos)

I can not find this class. Is this based on ’ torchtext.vocab. Vocab'?

Thanks for any hints.