Fastai.text test predictions

How to do predictions on a test dataset with fastai.text? I try with learn.get_preds but I get wrong results. May be data are shuffled somehow.

First I train the model

dls = TextDataLoaders.from_df(df, text_col='text', label_col='target', seq_len=36)
learn = text_classifier_learner(dls, AWD_LSTM, drop_mult=0.5, 
                           metrics=skm_to_fastai(f1_score), seq_len=36)
learn.fine_tune(4, 1e-2)

then I load and make predictions on a dataset (the same in this example):

dl_test = learn.dls.test_dl(df, with_labels=True)
preds = learn.get_preds(dl=dl_test, with_decoded=True)
df['preds'] = preds[2] # I assume that `preds[1]` are the targets and `preds[2]` are the predicted labels

The result is close to random.

f1_score(df['target'], df['preds'])
0.398

If I apply learn.predict the results are good, but it is very slow.

df['preds'] = df['text'].map(lambda x: learn.predict(x)[0]) 

The proper results are also given by:

f1_score(preds[1], preds[2])
0.86

For information, here is the format of pred:

preds
(tensor([[0.8350, 0.1650],
         [0.8271, 0.1729],
         [0.7271, 0.2729],
         ...,
         [0.7816, 0.2184],
         [0.7872, 0.2128],
         [0.7755, 0.2245]]),
 TensorCategory([0, 0, 0,  ..., 0, 0, 0]),
 tensor([0, 0, 0,  ..., 0, 0, 0]))
1 Like

Should that not be preds ?

Yes, it is preds, I correct the post, thanks. But the question is not changing.

Whats in df.head() for both train and test.
Edit…
Sorry reread labels are target.

Next Question

Were is the print out loss etcetera when learning and fine tuning

I have added only the problem part. The full notebook is here - https://www.paperspace.com/krasin/notebook/przpxr5ey
(playing with https://www.kaggle.com/c/nlp-getting-started data)

@sgugger any idea what I am doing wrong with fastai.text test_dl usage?

You need to tokenize your dataframe with tokenize_df I’d say.

No change. Both the results and the labels in the output of learn.get_preds(...) are shuffled

Yes, the dataloader gives you the results by order of lengths, to be memory-efficient.

Thank you, make sense. Can we restore the original order somehow? I’ll try with sorting the dataframe.

I added reorder=True to Learner.get_preds, so it will now be done by default.

1 Like

Now it works great :slight_smile: