Wierd Prediction results

Hi, i am experiencing something wierd. Not sure if I have done everything correctly and in the right order.
Using the Titanic data from Kaggle. Using fastai v1+.

procs = [FillMissing, Categorify, Normalize]
cat_names = ['Pclass','Sex', 'Title', 'SibSp', 'Parch','Embarked','Cabin']
cont_names = ['Age', 'Fare']
dep_var = 'Survived'

data = (TabularList.from_df(train_df_new, procs=procs, cont_names=cont_names, cat_names=cat_names)
        .add_test(TabularList.from_df(test_df_new, cat_names=cat_names, cont_names=cont_names, procs=procs))

learn = tabular_learner(data, layers=[200,100], metrics=accuracy)
learn.fit_one_cycle(5, 1e-3)

preds, y = learn.get_preds(ds_type=DatasetType.Test)

My y results are all zeros(as in all dead) when predicting on the Test dataset!

tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\

But pred p values are very low

tensor([[9.0810e-01, 9.1897e-02],
            [6.5247e-01, 3.4753e-01],
            [9.5031e-01, 4.9685e-02],
            [9.0966e-01, 9.0339e-02],
            [5.7985e-01, 4.2015e-01],
            [8.8298e-01, 1.1702e-01],

What could be going on?

I was trying to solve a tabular classification task in the past few days but it looks like there is a bug with the instantiation of the TabularDataBunch => see the post by Sylvain here.

So I’m surprised that you can even get this far. Which version of fastai did you install?

I have 1.0.42 installed currently.

What I also find wierd is when I use show batch and specify the test dataset, there is a target column with all zeros. Is this correct, i mean before i even train anything?




Also, can I use plain “Test” as ds_type instead of DatasetType.Test as I saw someplace on the forums?
I noticed that the results are different when I leave it blank, use “Test” and DatasetType.Test.

I think that the per default no labels for the test set are stored. But please make sure that this is also true in your case:


Therefore the y tensor does not contain the actual prediction classes. To get them you can compute the index of the maximum value in your prediction probabilities:

preds, _ = learn.get_preds(ds_type=DatasetType.Test, ordered=True)
pred_prob, pred_class = preds.max(1)

Yes the test set is always unlabeled in fastai (see here for more information and how to validate on a second validation set). The test set is there to quickly get predictions on unlabeled data.

Thanks! The y label is in fact blank as it should be.

Natalie’s code helped me get actual predictions out of my model.
But I got an error with ordered=True, it may not be included in the latest code versions?

ordered is only an argument in text, because the texts are sorted by their lengths (so the predictions come in a different order). In tabular, you predictions come in the same order as your dataframe, so that argument doesn’t exist.

1 Like

How do I make sure that only top-3 or top-2 predictions are considered during the training phase. This is in reference to multi-class or multi-label scenarios?