Kaggle Histopathological Cancer Detection

Yes I did. I’m also underfitting and I think there’s room for improvement. Also evaluation says that you only have to look in center 32x32 patch of image which I totally ignored.

I’m trying this competition too for practice during DL1. Can I ask what you set metrics to? The evaluation page says to use ROC, so my first attempt was just to do:

from sklearn.metrics import roc_curve
metrics = [roc_curve]

However, as soon as it tried to go through the validation set, I got the error:

ValueError: continuous-multioutput format is not supported

This was inside roc_curve. Not sure how to proceed. Try to workout why it’s not working, or am I just using the wrong metrics? Any help would be much appreciated.

I just used accuracy as it works good enough in this case(training data is not skewed) I’ll report how it fares with roc scoring later though. Also the metric should be

from sklearn.metrics import roc_auc_score

Not totally sure though. ROC is just a curve, the metric is the area under that curve.

1 Like

OK, thanks a lot. Still a newbie in applying Fast.ai to new problems.

The metric you suggested gave the same error. I guess the issue is that it isn’t suitable for the live calculation of the status updates. However, I guess I can also feed in the predictions and the ground truth manually after training to get the same metric as the Kaggle competition.

I believe you could write your own wrapper for roc_auc_score that takes in the same arguments like accuracy. I’m not sure whether we feed in the actual probabilities or the predictions in the validation step of the training loop

1 Like

See https://forums.fast.ai/t/kaggle-histopathologic-cancer-detection/30669 where it is already being discussed.

I guess I’m also not in latest version of the course. Is there a way to join it and see the post?

I believe you could write your own wrapper for roc_auc_score that takes in the same arguments like accuracy .

Not possible. ROC-AUC is computed using a changing threshold. If you feed it probabilities, the threshold is defined as the minimal probability that you’ll tag as positive. So a sample for which the model assigned a probability of 0.56 will be classified positive with a threshold of 0.5, and negative with a threshold of 0.6. Each point on the ROC curve corresponds to a specific threshold, and the ROC-AUC score is the area underneath the curve. So basically you have to pass continuous values to compute the ROC-AUC, not predictions.

Not much in that post really, except a call to participate. Kaggle rules mean solution discussion should ideally happen in Kaggle forums.

2 Likes

Barashe,

I have been using as a training metric:
def auc_score(y_score,y_true):
return torch.tensor(roc_auc_score(y_true,y_score[:,1]))

But I merely used the ROC computation from a Kaggle kernel without fully understanding the ROC measure.

Is there a correct way to simulate Kaggle’s scoring using our own validation set? My validation AUROC measure is always much higher than Kaggle’s (on their test set).

Thanks!

Well, if I understand you correctly, you wish to asses your trained model’s ROC-AUC on your own validation set. If that is so, you need to use your model and predict probabilities for the validation set.
Then, the only thing you need is:

from sklearn.metrics import roc_auc_score
roc_auc_score(y_true, y_proba)

Here I assume y_true is the correct labels, and y_proba is you model’s probability predictions on the validation set.

Thanks for responding. Yes, that is exactly the formula I am using. The mystery is that ROC-AUC on validation set is .99, while Kaggle’s score on test set is .94. I have made sure there’s no leakage of validation set into training set.

Try and plot the curve, and post the plot here.

Thanks for taking a look. I re-trained 3 epochs from the beginning in order to make sure there is no mixing of datasets. Validation set is 20% of training images.

Here are my accuracy and ROC computations:
def auc_score(y_score,y_true):
return torch.tensor(roc_auc_score(y_true,y_score[:,1]))

# Predict the validation set
probs,val_labels = learn.get_preds(ds_type=DatasetType.Valid) # Predicting without TTA
accuracy(probs,val_labels), auc_score(probs,val_labels)

Out: (tensor(0.9523), tensor(0.9880))

Here is submission to Kaggle:
testprobs,test_labels = learn.get_preds(ds_type=DatasetType.Test) # Predicting without TTA
testdf = data.test_ds.to_df()
testdf.columns = [‘id’,‘label’]
testdf[‘label’] = testprobs[:,1]
testdf[‘id’] = testdf[‘id’].apply(lambda fp: Path(fp).stem)
testdf.to_csv(SUBM/‘rocTest.csv’, index=False, float_format=’%.9f’)

Kaggle score = 0.9478, 4% lower than the AUROC calculated from my local validation set.

Here is ROC computation and graph (I copied this code):

I would very much like to track the effect of experiments on Kaggle score. Thanks for any hints!

1 Like

It seems like everything is properly done.

Is the Kaggle score calculated on your validation set? Or on the unseen test set?

Kaggle score is from the test set of images. The mysteries are:

  1. Kaggle AUC on test set is much lower than my AUC on my validation set.
  2. Accuracy and AUC scores on my validation set are much different. (Should these be close?) Kaggle AUC score more closely resembles accuracy score on my validation set.

I was not confident of my code to create the Kaggle submission and even rewrote it to predict each image separately and write the submission file line by line. No difference.

Hi @archit, I wonder did you use very small learning rate? After training for a while I found the performance is not improving and the lr_find gives a flat line (except at large lr), but my lr is already ~ 1e-8, which is pretty small…

1e-8 is very small. Are you sure you have set up the neural network properly? Also, try to scale up the images to 256x256. That improves resnet accuracy very much.

Thanks for the reply!
At the beginning (after unfreeze), the lr_find tells that I can use ~ 1e-3, I use 1cycle policy for 3 epochs, then decreased the lrs to ~ 1e-5 (as lr_find suggests) and run 1cycle policy for another 6 epochs, then I can only use very small lrs, otherwise the performance is not improving. Maybe I should do a large cycle rather than gradually increasing the cycle length… But I think I should be setting things correctly…
Scaling up is an interesting idea, I haven’t thought about that, thank you for the tips!

I’m getting the same thing. Fastai score 99.4%. Kaggle 94.9%.

Here’s some more investigation of the Test/Validation discrepancy, though I don’t know what to conclude from it.

  1. Calculation of mean and standard deviations of the three image channels (pixel values) showed no significant differences between the full Training set and the Test set. The images have the same gross characteristics.

  2. Calculation of cancer frequency by probability >.5…

Fraction of actual cancer in Training set = 0.40548801272582663
Fraction of predicted cancer in Validation set = 0.40247699125099423 (omitted from original post)
Fraction of predicted cancer in Test set = 0.33198162135820947

The latter Test prediction gets a Kaggle score = .9536, Validation AUC = .9923. Validation accuracy = .9659.

So it looks to me like the Kaggle Test and Training set are NOT uniform samplings of the same source image set, evidenced by the different prediction rates for cancer. The Test set is biased toward non-cancer.

All these test were done in a single session with a DataBunch created once. Therefore there’s no chance of leakage of Training images into Validation images.

Any ideas?