Deep dive in to lung cancer diagnosis

@davecg It would be hard to get a working/stable nodule detector without lung segmentation. Potential adenopathy and metastasis could be independently interesting to improve final probability. But usually in lung cancer screening we need to find small nodules to detect early treatable T1 cancers to save lives. There is usually no diagnostic dilemna about the dominant nodule/mass when we observe a N+ and/or M+ cancer. They present as large spiculated nodules or masses.

As an adjunct to the radiologist, this algorithm could be interesting in borderline, not clearly suspect, not clearly benign, nodules (6-10 mm) to plan the workup (follow-up vs biopsy vs PET-CT vs surgery). Or of course, if nodule sensibility/specificity suprahuman-expertise is validated, to apply it as a fully automated ct lung screening tool with the nice responsability challenge of dealing with unrelated potentially significant observations (giant aorta aneurysm, lymphoma, thyroid cancer, pneumonia, tuberculosis, and many more) in the images. As I suggested in the screening mammography thread , it would be interesting to apply this code on a new external dataset with cross-validation from multiple radiologists to compare expert and machine ROCs as Google did for its Lancet diabetic retinopathy article. @jeremy probably has some interesting ideas to improve even more the result.

3 Likes

As long as you had a labeled lung nodule dataset like LUNA, lung segmentation should be unnecessary. Even without taking into account features far from the lung, I would not want to train solely on lung parenchyma without including pleura and chest wall.

I know that I would be far more likely to miss a small nodule than an invasive cancer, but I also don’t think radiologists or referring providers would put much faith in a model that missed them.

2 Likes

Hey Everyone,

I’m just getting back into learning this. I’m very beginner.
I have 4 1080ti’s maybe 5 to lend to the effort.

What’s absolutely needed are the locations of the centroids of the nodules. Not a full segmentation - although that can be helpful (the winners took advantage of the full polygon info from LUNA).

Using this info doesn’t stop you from including chest wall and pleura features - indeed the #3 team did exactly that!

I want to be part of this… i am novice in deep learning but very interested to work on this… How can i get a good labeled dataset to start building the algorithm and try out my experiments with it…Please provide your suggestions

@jeremy I definitely like the idea of the project and am willing to help. I definitely think having so many scans should make it easier to get good results. We are working on a similar project with PET-CT images (only 2000 scans though) and like many medical projects ran into serious issues on physician agreement when making ground truth. If you are google you can buy your way out of the issue with 21 different physicians (http://www.nature.com/nature/journal/v542/n7639/full/nature21056.html?foxtrotcallback=true), but do you have any ideas how it could work for this project?

As far as making tools for radiologists, the following project (https://github.com/concept-to-clinic/concept-to-clinic) has made quite a few in-roads for moving machine learning ideas to medical tools in practice. We have used Slicer as starting point for our annotation tools, since it is python-based and easy to customize.

1 Like

Is anyone still actively working on this? I would love to get involved. I have some experience with conv nets, a small GPU (970), and I may be able to find some friendly radiologists.

I think we’re likely to look into it more in the coming months since we have a project collaboration between USF and UCSF coming up.

Brad Kenstler has done a great CT preprocessing walkthrough here: http://nbviewer.jupyter.org/github/bckenstler/dsb17-walkthrough/blob/master/Part%201.%20DSB17%20Preprocessing.ipynb

4 Likes

I’m new but i’m very interesting in lung cancer detection

I am writing to all who have data from Kaggle Data Science Bowl 2017. Can I ask you to join seeding on the official Kaggle torrent files if you still have got the data?

I would very much like to practice and learn by going through the top places solutions and looking at data but there are no seeders currently. I am based in China and direct download is just impossible with Chinese government’s censorship. It would be great if you could join in. Thanks!

Just commenting to say I’m interested in eventually giving some help in the future! Bioengineer with a PhD in image analysis now working in deep learning, so this data is kind of the perfect match! I’ll send this to my academic connections, you never know!

@jeremy ill join once im done with Part 1 v2 if thats alright :slight_smile:

i am interested in this, thank you!

Hi all. The challenge of nodule detection seems to be a step beyond a ‘cats vs dogs’ type situaiton, where you might only be trying to tell the difference between healthy and unhealthy (any condition) CT scans. In that case, I’m guessing you need some sort of 3D CNN, as I believe CT scans are examined by radiologists taking into account the z-axis. I.e. they don’t take each slice at a time (like you might do with 2D radiographs), but instead scroll through and look for object behaviour in all 3 dimensions. Does that sound about right? Thanks.

I am still potentially interested to collaborate with you @jeremy even if I currently have some others active projects. Everything started here last year for me.

1 Like

The project sounds interesting. I also want to be a part of it

I just started part 1 v2 but would definitely be very interested in participating in this project!

1 Like

I am also interested in joining the project when it kicks off. Meanwhile, I will prepare myself by going through the preprocessing jupyter notebook.

If anyone gets through the preprocessing and starts training some models on the data science bowl competition data, please do let us know! That’s the starting point we need to get to before we can start making progress on this.

4 Likes

Do you have a link to the data science bowl competition? I am interested in looking into this at least and contributing if possible.