I’d like to understand why my Kaggle test set score (.94) is so much lower than the area under ROC calculated on the validation set (.99). Using fastai 1.0.34.
Are the results returned by learn.get_preds(ds_type=DatasetType.Test) guaranteed to be in the same order as those in data.test_ds.to_df()? Here is my code in context:
data = ImageDataBunch.from_csv(csv_labels=‘train_labels.csv’, suffix=’.tif’, path=DATA, folder=‘train’, test=‘test’, ds_tfms=None, bs=BATCH_SIZE, size=SIZE2).normalize(imagenet_stats)
testprobs,val_labels = learn.get_preds(ds_type=DatasetType.Test)
testdf = data.test_ds.to_df()
testdf.columns = [‘id’,‘label’]
testdf[‘label’] = testprobs[:,1]
testdf[‘id’] = testdf[‘id’].apply(lambda fp: Path(fp).stem)
testdf.to_csv(SUBM/‘rn34s2x.csv’, index=False, float_format=’%.7f’)
Is this the best way (correct, clear, not fragile) to prepare a test set submission using fastai 1.0?
My local AUC is calculated from the validation set by:
def auc_score(y_score,y_true): return torch.tensor(roc_auc_score(y_true,y_score[:,1])) # use as metric probs,val_labels = learn.get_preds() auc_score(probs,val_labels) (.99) accuracy(probs,val_labels) (.985)
As said above, Kaggle’s AUC score on their test set is .94. The problem is not so much that the Kaggle score is low (though I’d certainly like a higher one), but that I do not have a reliable way to measure various experiments.
Thanks so much for any hints.
P.S. The local AUC and accuracy scores remain about the same even when refreshing the DataBunch (train/validation split).