Test set predictions not comparable with Validation set predictions

ajthinking · April 5, 2019, 11:25pm

I built a minimal classifier with the tabular module to tackle this competition. It trains and validates with somewhat reasonable accuracy and there are a mixture of the classes [0,1] in the validation predictions as expected. But as I inspected the predictions of the TEST data there are no 1:s only zeros…

I looked into the source of get_preds but had a hard time grasping it and could not find a way to proceed debugging. I would highly appreciate any pointers on my issue… or suggestions what to read up on. Below is my code and details.

Full code: https://github.com/ajthinking/kaggle-santander/blob/master/tabular.ipynb
Issue: https://github.com/ajthinking/kaggle-santander/issues/2

Thank you!

Pomo · April 6, 2019, 1:48am

You are looking at the labels for the validation and test sets, not the predictions. The predictions are at index zero. Labels for a Test set are always zero.

ajthinking · April 6, 2019, 2:10am

Yes, so if I look at index 0 instead I will have all the probabilities. Looks something like [[0.9, 0.1],[0.95, 0.05] ... ] which all translates to the class 0 as seen in the second index (?).

EDIT: oh NOW I understand!! Thanks !!!

ajthinking · April 6, 2019, 4:32am

Ended up doing it like this

probs = learn.get_preds(ds_type=DatasetType.Test)[0]

def probs2class(item): 
    return max(range(len(item)), key=item.__getitem__) 

test_df = pd.DataFrame({'ID_code': df_test['ID_code'], 'target': list(map(probs2class, probs))})

Hope to find a helper to replace the mapping hastle with a probs2class def though.