I have trained and exported a model that works with tabular data and I am now following the Inference Learner while trying to generate results on a test set.
I load my test dataframe with:
test_df = pd.read_csv("data/testSet.csv", dtype=dtypes, nrows=5)
I use the same cat_names
, cont_names
and procs
when loading my test_df
dataframe into a TabularList
:
data = TabularList.from_df(test_df, cat_names=cat_names, cont_names=cont_names, procs=procs)
I load my learner and mark my data as test data:
learner = load_learner('data/', test=data)
I generate predictions with .get_preds()
and print them.
preds,y = learner.get_preds(ds_type=DatasetType.Test)
print(preds)
tensor([[0.8476, 0.1524],
[0.7529, 0.2471],
[0.8152, 0.1848],
[0.8072, 0.1928],
[0.7275, 0.2725]])
To double check, I use the same learner to predict directly on the TabularList
learner.predict(data[1])
(Category 0, tensor(0), tensor([0.9867, 0.0133]))
Strangely this doesn’t give the same result as above. I also tried against the dataframe (test_df
) itself.
learner.predict(test_df.iloc[0])
(Category 0, tensor(0), tensor([0.5344, 0.4656]))
I wouldn’t expect the last one to work (the preprocessing steps haven’t been applied to the dataframe) but I don’t understand why the other two are giving me different answers.
Have I misunderstood something? Are there any additional steps (put something in .eval()
mode or something) when calling either .predict()
or .get_preds()
? I would expect them to give the same results.