Learn.get_preds on test acts differently than on val

MorP · July 19, 2021, 1:31pm

Hi,

I am using fastai v1 (currently cant update unfortunatly).
I have trained tabular model, and I was about to use the model for prediction on a test set.
Before that I tried using learn.get_preds on the validation set,
then I added the validation set as a test set (used data.valid_ds.x.inner_df) and used get_preds again,
as a sanity check for me understenading predictions using fastai platform.
as you can see in the code:

validation_preds = learn.get_preds(DatasetType.Valid)
validation_bin_preds = torch.argmax(validation_preds[0], dim=1)
       
# Build a test set in a TabularList way, that will be equivalent to validation set
cat_names = ['a','b']
cont_names = ['c','d']
procs = [FillMissing, Categorify, Normalize]
my_list = TabularList.from_df(df=data.valid_ds.x.inner_df, cat_names=cat_names,
                                      cont_names=cont_names, procs=procs)
data.add_test(my_list)  # add_test should get a Tabular List
test_preds = learn.get_preds(DatasetType.Test)
test_bin_preds = torch.argmax(test_preds[0], dim=1)

results (validation_bin_preds & test_bin_pred) were a lot differen! - in the first way validation_bin_preds had predicted 900 positives but when predicting with the same examples as test set (i.e in test_bin_preds) I got about 150 positives!.

(I am doing binary 0/1 classification)
(I am not reffering to the order of the prediction or to the ‘target’ part of the result).

Do you know what could be the reason for that?
I figured I have to build a TabularList for my test set but since I guess I am already missing something on the validation set I am not sure what to do…

Thanks,
Mor

MorP · July 21, 2021, 1:11pm

perhaps you could help me? @muellerzr