I tried also this competition cutting on my sleep hours ...
Radiologists have about 90% specificity with 80% sensibility of detection. More than 90% of screening exams in real life have a previous comparison exam. Radiologists AUROCs are a lot higher with a previous exam than without.
The actual leader has about 80% specificity with 80% sensibility in sub-challenge 1 and 2. Even if not better than human, this result could easily be use to triage screening exams to prioritize the interpretation by a radiologist. If the results are truly open source as specified (code + weights), I'll try to implement that kind of triage in my own departement.
Like I said, I can easily improve my own ROC curve as a radiologist, by comparing the images with the previous exam. Surprisingly I don't see much difference in the best AUROC from the sub-challenge 2 compared to sub-challenge 1. Another important factor to improve AUROC is the spatial 3D cross correlation for a suspected feature (MLO and CC correlation). A spatially correlated suspected feature in both views is a lot more worrisome than seen in only one view.
Consequently, I think a proper comparison implementation with optimal resolution, decent segmentation, preprocessed normalisation using a 2 channels (current image, last image) or 4 channels (current MLO, last MLO, current CC, last CC) could potentially get as good or potentially better than a radiologist. Inattention/Variance is the weakness of almost any human task. Like radiologists do, this comparison technique could be used also without a previous exam but simply by comparing the current exam with the controlateral current exam, with a different trained network (sensitivity to comparison changes is quite different).
I tried a 3 channel variant of this implementation, training from scratch, but I was stucked with bad overfitting since there are only < 1000 cancers in the dataset. Fine tuning a pretrained model is interesting but lack of resolution probably explains low pAUROC and low specificity at sensibility 90% even if decent AUROC. I don't really know how to use a low res pretrained network with a higher resolution image without resizing the image. I read that the first layers weights/features of a Conv layer usually can be used with higher resolution but I don't know how to do this in Caffe or Keras. Any idea to help if the dataset can still be used after the competition ?
That competition was a great way to learn.