Kaggle Histopathological Cancer Detection

barashe · December 16, 2018, 8:02am

It seems like everything is properly done.

Is the Kaggle score calculated on your validation set? Or on the unseen test set?

Pomo · December 16, 2018, 7:58pm

Kaggle score is from the test set of images. The mysteries are:

Kaggle AUC on test set is much lower than my AUC on my validation set.
Accuracy and AUC scores on my validation set are much different. (Should these be close?) Kaggle AUC score more closely resembles accuracy score on my validation set.

I was not confident of my code to create the Kaggle submission and even rewrote it to predict each image separately and write the submission file line by line. No difference.

PPPW · December 21, 2018, 1:45am

Hi @archit, I wonder did you use very small learning rate? After training for a while I found the performance is not improving and the lr_find gives a flat line (except at large lr), but my lr is already ~ 1e-8, which is pretty small…

archit · December 21, 2018, 5:11am

1e-8 is very small. Are you sure you have set up the neural network properly? Also, try to scale up the images to 256x256. That improves resnet accuracy very much.

PPPW · December 21, 2018, 3:47pm

Thanks for the reply!
At the beginning (after unfreeze), the lr_find tells that I can use ~ 1e-3, I use 1cycle policy for 3 epochs, then decreased the lrs to ~ 1e-5 (as lr_find suggests) and run 1cycle policy for another 6 epochs, then I can only use very small lrs, otherwise the performance is not improving. Maybe I should do a large cycle rather than gradually increasing the cycle length… But I think I should be setting things correctly…
Scaling up is an interesting idea, I haven’t thought about that, thank you for the tips!

Mauro · January 11, 2019, 4:31pm

I’m getting the same thing. Fastai score 99.4%. Kaggle 94.9%.

Pomo · January 19, 2019, 7:38am

Here’s some more investigation of the Test/Validation discrepancy, though I don’t know what to conclude from it.

Calculation of mean and standard deviations of the three image channels (pixel values) showed no significant differences between the full Training set and the Test set. The images have the same gross characteristics.
Calculation of cancer frequency by probability >.5…

Fraction of actual cancer in Training set = 0.40548801272582663
Fraction of predicted cancer in Validation set = 0.40247699125099423 (omitted from original post)
Fraction of predicted cancer in Test set = 0.33198162135820947

The latter Test prediction gets a Kaggle score = .9536, Validation AUC = .9923. Validation accuracy = .9659.

So it looks to me like the Kaggle Test and Training set are NOT uniform samplings of the same source image set, evidenced by the different prediction rates for cancer. The Test set is biased toward non-cancer.

All these test were done in a single session with a DataBunch created once. Therefore there’s no chance of leakage of Training images into Validation images.

Any ideas?

Pomo · February 24, 2019, 12:12am

This AUROC metric implementation is wrong. For the right way see…

krash · March 4, 2019, 5:47pm

Can you please elabrate on this?