Thanks for taking a look. I re-trained 3 epochs from the beginning in order to make sure there is no mixing of datasets. Validation set is 20% of training images.
Here are my accuracy and ROC computations:
# Predict the validation set
probs,val_labels = learn.get_preds(ds_type=DatasetType.Valid) # Predicting without TTA
Out: (tensor(0.9523), tensor(0.9880))
Here is submission to Kaggle:
testprobs,test_labels = learn.get_preds(ds_type=DatasetType.Test) # Predicting without TTA
testdf = data.test_ds.to_df()
testdf.columns = [‘id’,‘label’]
testdf[‘label’] = testprobs[:,1]
testdf[‘id’] = testdf[‘id’].apply(lambda fp: Path(fp).stem)
testdf.to_csv(SUBM/‘rocTest.csv’, index=False, float_format=’%.9f’)
Kaggle score = 0.9478, 4% lower than the AUROC calculated from my local validation set.
Here is ROC computation and graph (I copied this code):
I would very much like to track the effect of experiments on Kaggle score. Thanks for any hints!