Learn.get_preds(ds_type = DatasetType.Test) with a Tabular Learner returns predictions in different order than test set data order


(Marc Laugharn) #1

It would be nice to be able to get predictions in the same order as they are in my test dataframe.

Code for reference:

import pandas as pd
from fastai import *
from fastai.tabular import *

path = './'
train_df = pd.read_csv('./train.csv')
split = 40000
valid_idx = range(len(train_df)-split, len(train_df))
test_df = pd.read_csv('./test.csv')
dep_var = 'target'

data = TabularDataBunch.from_df(path, train_df, dep_var, valid_idx=valid_idx, test_df=test_df)
learn = tabular_learner(data, layers=[200,20], metrics=accuracy)

learn.data.show_batch()
learn.lr_find()
learn.recorder.plot()
learn.fit_one_cycle(10, 1e-2)

preds, y = learn.get_preds(DatasetType.Test) # <-- I think these are in a different order than test_0, test_1, etc.

As a workaround right now I am just iterating through the test dataframe and predicting each row one at a time, but this is just slow and seems wrong


#2

That is weird, they normally are in the same order. Can you check that data.show_batch(ds_type=DatasetType.Test) returns the same things as your first rows?

Note that test sets are unlabeled, so if you say this because your ys are 0, this isn’t a good check.