I am using fastai v1 (currently cant update unfortunatly).
I have trained tabular model, and I was about to use the model for prediction on a test set.
Before that I tried using learn.get_preds on the validation set,
then I added the validation set as a test set (used data.valid_ds.x.inner_df) and used get_preds again,
as a sanity check for me understenading predictions using fastai platform.
as you can see in the code:
validation_preds = learn.get_preds(DatasetType.Valid) validation_bin_preds = torch.argmax(validation_preds, dim=1) # Build a test set in a TabularList way, that will be equivalent to validation set cat_names = ['a','b'] cont_names = ['c','d'] procs = [FillMissing, Categorify, Normalize] my_list = TabularList.from_df(df=data.valid_ds.x.inner_df, cat_names=cat_names, cont_names=cont_names, procs=procs) data.add_test(my_list) # add_test should get a Tabular List test_preds = learn.get_preds(DatasetType.Test) test_bin_preds = torch.argmax(test_preds, dim=1)
results (validation_bin_preds & test_bin_pred) were a lot differen! - in the first way validation_bin_preds had predicted 900 positives but when predicting with the same examples as test set (i.e in test_bin_preds) I got about 150 positives!.
- (I am doing binary 0/1 classification)
- (I am not reffering to the order of the prediction or to the ‘target’ part of the result).
Do you know what could be the reason for that?
I figured I have to build a TabularList for my test set but since I guess I am already missing something on the validation set I am not sure what to do…