Medical Imaging

jeff · November 1, 2016, 5:54pm

@jeremy asked me to create this topic based on my questions on how to approach the digital mammography DREAM challenge data set that I had asked in the lesson 1 topic. The challenge basically asks participants to help improve diagnostic accuracy of mammograms. Currently, too many patients get called back for further inspection, which can be very stressful for the patient. Submitted models should predict whether the breast will be cancerous within a year. The data set includes 40,000 de-identified digital mammography images from over 86000 patients. Another part of the challenge is to see if adding the patient’s clinical/demographic data, and, if available, previous mammograms, improves accuracy. The mammogram images are VERY high resolution (in the megapixels range, at least).

My original questions for @jeremy were:

Since the digital mammography challenge asks the mammogram image PLUS clinical data would improve diagnostic accuracy, how would you architect the neural network? Would it be a deep and wide model (as described in https://research.googleblog.com/2016/06/wide-deep-learning-better-together-with.html )? Is that how you solved similar problems at Enlitic (if you’re allowed to say)?
Since medical images are usually VERY high resolution, did you find a certain range of downsampled pixel resolutions and training batch sizes that worked well enough for you? Does it depend on the type of medical image?

chris · November 9, 2016, 4:39am

Slightly off topic, but I wanted to mention how much I enjoyed reading the research paper accompanying the blog post. I would love to know more about recommender systems, any pointers?

jeff · November 9, 2016, 4:47am

I took the recommender systems course on Coursera years ago and learned a lot.

chris · November 9, 2016, 4:03pm

I saw this article yesterday and I can’t help connecting the dots: Circadian rhythms and cancer: potential mechanisms. Could I build a recommender system that helps people entrain circadian rhythm? Prevention is better than early detection, right?

jeremy · November 9, 2016, 8:10pm

@jeff here are the answers to these two questions:

I don’t see any upside in using the rather complex Google wide/deep approach, compared to just grabbing the activations from a late layer of a CNN, concatenating them with the patient data (possibly after doing some dimensionality reduction on the CNN layer), and then chucking it all into a random forest. But I haven’t seen rigorous comparisons of these techniques
Dealing with high res images is an open question. Generally, you want to run a detector on a down-sampled low-res image that can find with some level of accuracy all of the ‘interesting’ areas. Then crop out high-res versions of small areas around each of those interesting areas, and make that the input to a 2nd model

jeff · November 10, 2016, 5:58pm

Thanks, @jeremy. I’ve downloaded their sample data set of 500 mammogram images (recall that the full training data set is not downloadable outside of their cloud environment) and have done a bit of exploratory data analysis. Just as I suspected, this is a case of imbalanced classes where the number of healthy images vastly outnumber cancer examples. Because the accuracy metric in Keras is no longer meaningful, I’m not sure which loss function to use. I know I should use area under the ROC but there is no such evaluation function in Keras. Should I write my own, or does one already exist?

vshets · November 10, 2016, 6:14pm

Perhaps precision, recall or f1 score? Not sure if they exist in Keras.

jeff · November 10, 2016, 6:40pm

Good point. I should’ve checked SciKit Learn (see https://github.com/fchollet/keras/issues/832), but I’m still not sure about whether I need to modify my loss function to something other than cross entropy.

jeremy · November 10, 2016, 6:43pm

I can’t see any reason not to use cross-entropy - can you?

jeff · November 10, 2016, 6:45pm

Cross entropy makes sense, but I was wondering if I could maximize AUC directly without going through the step of calculating cross entropy first.

jeremy · November 10, 2016, 7:04pm

Probably not - AUC is calculated entirely on the sort order of the data, which is not a continuous (or differentiable) value. It also take a relatively long time to calculate (since it requires a sort).

chris · November 19, 2016, 5:20pm

Anyone interested in mammography might want to watch this:

davecg · March 2, 2017, 8:13pm

Anyone else working on the DREAM challenge? It looks like the Cox lab from Harvard has found a method that works pretty well - AUC 0.8605. They have some interesting stuff on their github but not sure what their method was for the mammography challenge.

https://github.com/coxlab

djones · July 2, 2017, 11:03am

Do you know where the Cox mammogram challenge code is available for download? Or any other available downloads from the competition? I had heard they were open source?