Deep dive in to lung cancer diagnosis

(Vishnu Subramanian) #41

Another interesting writeup by the 9th person of the 2017 national data science bowl competition.

(Nagu) #42

The writeup from Daniel Hammack, the team from 2nd place winner.


Reading Team Deep Breath approach.

(segovia) #44

I would like to participate as well, if it’s not too late. I am halfway through part 1 and working on the cervix cancer comp now. My home machine has two nvidia 1080 cards - not too much but still useful, right?

(Jeremy Howard) #45

2 1080 cards is great!

For everyone on this thread - digging into the data science bowl and trying to replicate the top results is the most helpful thing you can do right now. Please let us know how you go!

(Benedikt S) #46


the thread got quite since a while.
Is there any update? I have started to look into XRay / CT lung scans.

I found two interesting github repos:

In particular the latter one is super interesting. The UNet architecture for 3D convolutions.

And this:



I’m not a radiologist, but am an internist and have looked at lots of CT scans and know a bit about how to decide whether a nodule is malignant or not. I can help provide some info from the medical side.

Im working on a project with one of the 6th place winners (though not on this project). Both the 6th place winners are radiologists.

(segovia) #48

Replicating the solutions is on the top of my todo list. Thank you for these links! I hope doing it will help with the passenger scan comp too.

(Alexandre Cadrin-Chênevert) #49

Update : I am (canadian board certified radiologist) still available if needed when you are ready to classify/mark the dataset. Even more available if this is an open source/science project. After some deeper reflexion, I think this is probably the key to apply the potential AI healthcare benefit fairly to the entire world population in a not so distant future. Especially in Canada with our public healthcare system, granting access to anonymized healthcare imaging or lab data is a lot easier for open science than for profit. The challenge for open science is to apply/integrate the technology in a distributable way for everyone with the current commercially based infrastructure. Maybe this open community could succeed.

(Alexandre Cadrin-Chênevert) #50

Github with source code in pytorch of the winning team grt123 :
with the paper explaining their solution :

(David Gutman) #51

Interesting. Don’t think lung segmentation is a necessary step though, might even make diagnosis harder (e.g. Can’t evaluate for axillary lymphadenopathy or adrenal metastasis). I would bet the model would perform just as well without it (although size of image might become an issue).

(Alexandre Cadrin-Chênevert) #52

@davecg It would be hard to get a working/stable nodule detector without lung segmentation. Potential adenopathy and metastasis could be independently interesting to improve final probability. But usually in lung cancer screening we need to find small nodules to detect early treatable T1 cancers to save lives. There is usually no diagnostic dilemna about the dominant nodule/mass when we observe a N+ and/or M+ cancer. They present as large spiculated nodules or masses.

As an adjunct to the radiologist, this algorithm could be interesting in borderline, not clearly suspect, not clearly benign, nodules (6-10 mm) to plan the workup (follow-up vs biopsy vs PET-CT vs surgery). Or of course, if nodule sensibility/specificity suprahuman-expertise is validated, to apply it as a fully automated ct lung screening tool with the nice responsability challenge of dealing with unrelated potentially significant observations (giant aorta aneurysm, lymphoma, thyroid cancer, pneumonia, tuberculosis, and many more) in the images. As I suggested in the screening mammography thread , it would be interesting to apply this code on a new external dataset with cross-validation from multiple radiologists to compare expert and machine ROCs as Google did for its Lancet diabetic retinopathy article. @jeremy probably has some interesting ideas to improve even more the result.

(David Gutman) #53

As long as you had a labeled lung nodule dataset like LUNA, lung segmentation should be unnecessary. Even without taking into account features far from the lung, I would not want to train solely on lung parenchyma without including pleura and chest wall.

I know that I would be far more likely to miss a small nodule than an invasive cancer, but I also don’t think radiologists or referring providers would put much faith in a model that missed them.

(Andrew) #54

Hey Everyone,

I’m just getting back into learning this. I’m very beginner.
I have 4 1080ti’s maybe 5 to lend to the effort.

(Jeremy Howard) #55

What’s absolutely needed are the locations of the centroids of the nodules. Not a full segmentation - although that can be helpful (the winners took advantage of the full polygon info from LUNA).

Using this info doesn’t stop you from including chest wall and pleura features - indeed the #3 team did exactly that!

(Saurabh Jha) #56

I want to be part of this… i am novice in deep learning but very interested to work on this… How can i get a good labeled dataset to start building the algorithm and try out my experiments with it…Please provide your suggestions

(Kevin Mader) #57

@jeremy I definitely like the idea of the project and am willing to help. I definitely think having so many scans should make it easier to get good results. We are working on a similar project with PET-CT images (only 2000 scans though) and like many medical projects ran into serious issues on physician agreement when making ground truth. If you are google you can buy your way out of the issue with 21 different physicians (, but do you have any ideas how it could work for this project?

As far as making tools for radiologists, the following project ( has made quite a few in-roads for moving machine learning ideas to medical tools in practice. We have used Slicer as starting point for our annotation tools, since it is python-based and easy to customize.

(Austin Jacobson) #58

Is anyone still actively working on this? I would love to get involved. I have some experience with conv nets, a small GPU (970), and I may be able to find some friendly radiologists.

(Jeremy Howard) #59

I think we’re likely to look into it more in the coming months since we have a project collaboration between USF and UCSF coming up.

Brad Kenstler has done a great CT preprocessing walkthrough here:

(Victor Alfonso Arias Vanegas) #60

I’m new but i’m very interesting in lung cancer detection