Deep dive in to lung cancer diagnosis

jeremy · April 1, 2017, 8:18pm

I’m getting a hard drive full of 15,000 patients worth of CT scans from NLST soon! I’m hoping after this course finishes to dig deep into lung cancer diagnosis with this dataset that’s 15x larger than that used for the Kaggle competition.

Just putting it out there in the hope that others in this group might want to join a project on this. (@davecg a deep learning trained radiologist would be particularly helpful. hint hint )

cc @taposh @rainman (from Kaiser) - since you’ve mentioned being interested in this previously.

davecg · April 1, 2017, 9:40pm

iNLyze · April 1, 2017, 9:47pm

I am more than happy to help.

aifish · April 1, 2017, 9:47pm

Definitely interested in this project! I got inquiries from at least two sources about CT image analysis in the past a few weeks.

kelvin · April 1, 2017, 9:54pm

interested

renjithmadhavan · April 1, 2017, 10:40pm

That is great. Thank You. I am in.

jeff · April 1, 2017, 10:48pm

Count me in!

taposh · April 1, 2017, 10:50pm

This is great news !! I am up for it Lets do it…

sravya8 · April 2, 2017, 2:52am

Please, count me in!

rainman · April 2, 2017, 4:00am

Awesome! I am game.

melissa.fabros · April 2, 2017, 4:20am

jeremy · April 2, 2017, 8:38am

This is exciting! We’ll need plenty of GPUs… Anyone got a friendly research center that might have a big box we could share for this research project? If not, we can just use our home machines, and be thoughtful about having different people running different experiments. I’m planning to submit a grant application to Nvidia as well.

A good way to get started is to work on the Kaggle competition now. Stuff you’ll want to practice:

Using traditional medical imaging approaches to finding and removing lung walls etc (dilation/erosion, et al)
Using u-net for segmentation (see the official kaggle tutorial); 100 layer tiramisu might be better still, of course!
Using the LUNA dataset for building an initial “nodule finder”
Get the 5-dimensional mean shift clustering working as another nodule finder
Triplanar and/or 3-d CNNs for actual malignancy detection

We can give each other tutorials on any/all of these as we get into it.

In order to share the data with you, I’ll probably need to add you officially to the NLST project, so once I get the data I’ll probably need to get some official info from you all.

jeremy · April 2, 2017, 8:42am

Oh BTW @davecg (and anyone else with radiologist friends/family!) we’ll probably need some friendly and patient radiologists to do some basic tasks like:

Manually correcting the centroids (and removing false positives) from the nodule finder results, in order to get ground truth labels for nodule centers in NLST
Getting a “human benchmark” for both nodule finding and malignancy estimation to compare to (it would be best if this was done by some fairly experienced radiologists, so it’s a suitably challenging benchmark)
Checking the results of models to give feedback about where they seem to be going wrong
And probably plenty more…

davecg · April 2, 2017, 5:17pm

Will we be gettin longitudinal data from the NLST? Patients should have three annual scans in the trial, which would help with false positive reduction (only really care about nodules that change in patients with diagnosis of cancer).

As much fun as manually fixing labels 15000 chest CTs sounds, not sure I’m volunteering for that.

Happy to give feedback on False positives though.

taposh · April 2, 2017, 5:34pm

Hi Jeremy, I have a cancer researcher friend. If you think we need some one with that skills I can reach out to her for help with the grant. She has a post doc from Stanford and UCLA and works for OncoMed Pharmaceuticals.

brendan · April 2, 2017, 6:15pm

This project sounds great!

jeremy · April 2, 2017, 7:45pm

Yup 3 years of scans.

I believe their are just 1000 patients with cancer in the study, and they didn’t all have it showing in year zero. So it’ll be <3000 scans with malignant nodules. And we don’t need all of them for ground truth - just enough for a statistically valid comparison.

For the nodule centroids, we just need enough marked to make our models accurate enough. By fine-tuning from LUNA (which already has segmented nodules) I don’t think we’d need more than a few hundred, which (with a carefully designed nodule-marking app) could be done in a day or two I think.

jeremy · April 2, 2017, 10:00pm

I should also mention - I really think part of what’s interesting about this is figuring out how to best get the necessary info from radiologists. So it really helps to have radiologists involved who are interested in better understanding deep learning and contributing actively to the systems and processes.

samwit · April 2, 2017, 11:04pm

very interested to join in with this

sravya8 · April 3, 2017, 4:21am

@jeremy I have a couple radiologists in the family I can certainly ask them as long as it is a good use of their time (most time spent using their expertise). Any rough timeline when you think we will need their input?