Difference in result during inference when using learn.get_preds() or learn.predict()


#1

I am trying do inference on the tabular dataset and notice i get different results when i use test dataset and run inference as

learn.get_preds(DatasetType.Test)

or

df.apply(learn.predict, axis=1)

This is a simple reproducible notebook gist with needed pickle files. Check the confusion matrix in both cases.

Might be I am making mistakes in how library is being used, any feedback is welcomed. Thanks !!


Fastai text - get_preds() vs predict() in inference
#2

This comes from the fact that the dataframe you used for tests has been modified by the fastai library (the processors apply their transform inplace). I’ll look at why to try to fix it, but in the meantime, reload your dataframe to get the same results with predict.

Edit: This is now fixed in master.


#3

Thanks for the quick fix Sylvian. can confirm it is working fine. I have notice a small issues in data.classes value when using tabulardatabunch for classification task. It gets the categorical values as classes which is not unique classes of dependant variable. Although it does have correct count of classes under data.c I need to do a simple fix if i want to see the correct values as data.classes = data.valid_ds.y.classes I think the reason for this behavior is due to this code block Not sure as what is the best way to fix it.


#4

With a text_classifier_learner I still run into the same issue of getting different results using learn.get_preds(DatasetType.Test, ordered=True) and df.apply(learn.predict, axis=1) . Was the fix specific to tabular data?
Reloading or creating a deep copy of the dataframe with df_copy = df.copy() , helps to get the same categories as get_preds() , however the prediction probabilities differ quite a bit.
Any ideas why this is happening?


#5

The issue was specific to tabular, or so I thought. I’d need more details and a reproducible example to fix this.


#6

Ok, forget about copying the dataframe. I was applying the predict method to the entire dataframe instead of the text column which yield some weird results…

However, I still get different classes and probabilities when applying predict() and get_preds() to the test dataframe. Here is a minimal example based on the IMDB text classification tutorial. It’s not supposed to classify well, just a quick and light-weight way to replicate the issue.

Am I missing something essential, or are these two methods even supposed to return the same results?

from fastai.text import *

path = untar_data(URLs.IMDB_SAMPLE)
data = pd.read_csv(path/'texts.csv')

train_data = data.sample(150, random_state=42)
valid_data = data.drop(train_data.index).sample(50, random_state=42)
test_data = data.drop(train_data.index.append(valid_data.index)).sample(50, random_state=42)

data_lm = TextLMDataBunch.from_df(path, train_df=train_data, valid_df=valid_data)
data_clas = TextClasDataBunch.from_df(path, train_df=train_data, valid_df=valid_data, test_df=test_data, bs=4, vocab=data_lm.vocab)

learn = language_model_learner(data_lm, drop_mult=0.5, pretrained_model=URLs.WT103)
learn.fit_one_cycle(1, 1e-2)
learn.unfreeze()
learn.fit_one_cycle(1, 1e-3)
learn.save_encoder('encoder')

classifier = text_classifier_learner(data_clas, drop_mult=0.2)
classifier.load_encoder('encoder')

classifier.fit_one_cycle(1, 1e-2)
classifier.freeze_to(-2)
classifier.fit_one_cycle(1, slice(5e-3/2., 5e-3))
classifier.unfreeze()
classifier.fit_one_cycle(3, slice(2e-3/100, 2e-3))
classifier.save('classifier')

pred, _ = classifier.get_preds(DatasetType.Test, ordered=True)
preds_prob, preds_class = pred.max(1)

predict_df = test_data.text.apply(classifier.predict)
predict_df_class = [x[0].obj for x in predict_df.values]
predict_df_prob = [max(x[2].tolist()) for x in predict_df.values]

print(preds_class[:10])
print(predict_df_class[:10])
print(preds_prob[:10])
print(predict_df_prob[:10])

sample_nr = 5
sample_text = test_data.iloc[sample_nr].text
print(classifier.predict(sample_text))
print('{}, {}'.format(preds_class[sample_nr], pred[sample_nr]))

#7

They are supposed to return the same thing. I’ll look at this tomorrow.


#8

Eh! It has taken me a long time to figure out why the results are different but the answer is very simple: when you use the test set, your texts are padded to be put together in a batch, and that padding is (not yet) ignored by the model. In predict, your text is alone so there is no padding needed.
There was also a small issue of not adding the BOS token at the beginning but I took care of that.


#9

Thanks so much for figuring this out, @sgugger!
Is it correct to conclude that so far the classifier.predict(text) results are “more reliable” because they don’t show these padding effects?


#10

It depends on how you will use your model at inference time: will you have a lot of texts so feed them batch by batch? Or will you feed them one by one? Depending on which, you’ll trust one approach over the other.