This comes from the fact that the dataframe you used for tests has been modified by the fastai library (the processors apply their transform inplace). I’ll look at why to try to fix it, but in the meantime, reload your dataframe to get the same results with predict.
Thanks for the quick fix Sylvian. can confirm it is working fine. I have notice a small issues in data.classes value when using tabulardatabunch for classification task. It gets the categorical values as classes which is not unique classes of dependant variable. Although it does have correct count of classes under data.c I need to do a simple fix if i want to see the correct values as data.classes = data.valid_ds.y.classes I think the reason for this behavior is due to this code block Not sure as what is the best way to fix it.
With a text_classifier_learner I still run into the same issue of getting different results using learn.get_preds(DatasetType.Test, ordered=True) and df.apply(learn.predict, axis=1) . Was the fix specific to tabular data?
Reloading or creating a deep copy of the dataframe with df_copy = df.copy() , helps to get the same categories as get_preds() , however the prediction probabilities differ quite a bit.
Any ideas why this is happening?
Ok, forget about copying the dataframe. I was applying the predict method to the entire dataframe instead of the text column which yield some weird results…
However, I still get different classes and probabilities when applying predict() and get_preds() to the test dataframe. Here is a minimal example based on the IMDB text classification tutorial. It’s not supposed to classify well, just a quick and light-weight way to replicate the issue.
Am I missing something essential, or are these two methods even supposed to return the same results?
Eh! It has taken me a long time to figure out why the results are different but the answer is very simple: when you use the test set, your texts are padded to be put together in a batch, and that padding is (not yet) ignored by the model. In predict, your text is alone so there is no padding needed.
There was also a small issue of not adding the BOS token at the beginning but I took care of that.
It depends on how you will use your model at inference time: will you have a lot of texts so feed them batch by batch? Or will you feed them one by one? Depending on which, you’ll trust one approach over the other.
As you can see getting output in the last timestamp for 2nd sentence will not be correct. Mostly we will get only neutral(In sentiment). If there is a way to get output after 5th timestamp( in our example after this), We will get the correct output.
Note that the padding is applied first. (so it would be xxpad xxpad… xxbos I in the second example). get_preds returns you the final output, so it doesn’t have anything about the timestamps anymore (it’s the two probabilities for positive/negative in the case of sentiment analysis).