Inconsistency in predictions with text classification

Hi!

I have a problem with prediction quality when using different methods to get predictions.
When I use get_preds function, I get better results than I get with line-by-line predictions with predict:

res = cls_learn.get_preds(ordered=True)
preds = [l.argmax().item() for l in res[0]]
labels = [i.data for i in cls_db.valid_ds.y]
print(classification_report(labels, preds))

              precision    recall  f1-score   support

           0       0.97      0.95      0.96      2760
           1       0.91      0.93      0.92      1382

   micro avg       0.95      0.95      0.95      4142
   macro avg       0.94      0.94      0.94      4142
weighted avg       0.95      0.95      0.95      4142

def predict(text):
    l = cls_learn.predict(text)
    return l[2].argmax().item()

preds_by_one = cls_df_val.text.apply(predict)
print(classification_report(labels, preds_by_one))

              precision    recall  f1-score   support

           0       0.97      0.93      0.95      2760
           1       0.87      0.94      0.91      1382

   micro avg       0.94      0.94      0.94      4142
   macro avg       0.92      0.94      0.93      4142
weighted avg       0.94      0.94      0.94      4142

Am I doing something wrong? My fastai version is 1.0.42.

No you’re not: don’t forget that passing is added to make batches when you use get_preds, whereas predict doesn’t need to do that.
This where the difference comes from. Getting text_classifier_learner to ignore padding in its predictions (thus having the same behavior for the two) is in the mid-term plans.

Thank you! I was suspecting that, so I tried adding one xxpad at the beginning of a line when predicting line by line, but to no avail.

Any updates on getting that to happen?

This feature is in v2.

Ahh, how difficult is the migration?