Making predictions in v1

adrian · December 27, 2018, 3:03am

Thanks,

The y size of 8 was because I was using a fraction of the full dataset

On using the full dataset I still cant work out how to predict on all test data in one go and get the actual classes the predictions pertain to:

(much thanks to @willismar who pointed out how to pass in classes here: TabularDataBunch Error: "Your validation data contains a label that isn't present in the training set, please fix your data.")

classes = list(df[dep_var].unique())
classes.sort()
data = TabularDataBunch.from_df(path, df=df, dep_var=dep_var, valid_idx=valid_idx, procs=procs, cat_names=cat_vars, cont_names=cont_vars, classes=classes, test_df=df_test)

it seems like this is exactly same result as per fastai.core class generation:

def uniqueify(x:Series)->List:
    "Return sorted unique values of `x`."
    res = list(OrderedDict.fromkeys(x).keys())
    res.sort()
    return res

keys =uniqueify(df[dep_var].values)

classes==keys
>>True

then after training…

indexes=list(df_test.index.values)
preds, y = learn.get_preds(DatasetType.Test)
assert len(indexes)==len(preds)
d = {}
for indx, pred in zip(indexes, preds):
    max_idx = np.argmax(pred)
    #index into classes we defined above to get predicted classes
    d[indx] = classes[max_idx]

but if I compare the prediction using method above against prediction row by row - for same index in the test dataframe, the predicted classes are different:

d_rbr={}
for idx, row in df_test.iterrows():
        pred = learn.predict(row)
        d_rbr[idx]= pred[0].__str__()

#for any given index, often not true
assert d[idx_val]==d_rbr[index_val]

And I am a bit stuck as to how to get class results out of preds, y = learn.get_preds(DatasetType.Test) reliably.