Deep dive in to lung cancer diagnosis


I am in too @jeremy

I am interested as well.

I wouldn’t expect it would be for at least a couple of months.

1 Like

I can float the idea around my department, but people might be a bit wary of the Enlitic connection.

@davecg there is no connection any more - I’m not an employee or on the board; just a shareholder.

1 Like

Count me in! I’m doing the Kaggle competition right now, in a team with a friendly radiologist. Would love to continue working on this after the competition.

1 Like

@jeremy: if not too late, I’d like to join this effort. Very interested in lung cancer detection. I can contribute some free time next couple of months, and a GPU

Any specific steps you suggest to start?

Second place solution on Kaggle for DSB 2017:


Thanks for sharing @davecg Very interesting, though not easy to follow. The one point
I did understand is that they used cross-validation instead of leaderboard for model selection.
That is helpful information

@ljubomir, working through the tutorial and top rated kernels on the kaggle comp is the best place to start, and then reading the winners’ various solutions. I haven’t received the full dataset yet, but we won’t need to use that until we’ve mastered the kaggle dataset.

I would love to participate in this effort as well. I’m still halfway part 1, but will eventually catch up.

I’ve actually been working in medical imaging for 30+ years, but always stayed clear of segmentation algorithms because it was just too hard to do for me - with modern techniques, things are so much easier. My expertise is algorithm tuning and 3D visualization - at some point, I’ll figure out how to make beautiful 3D renderings of our results :slight_smile:

Even if time is limited, I could be interested in this project to help markup parts of @jeremy new dataset. I don’t know the definition of expertise but I have personally done about 500 nodules biopsies in the past 10 years and probably read about 15000 thoracic CTs (about 500 images per exam) …
Of course a decent markup software would be a must.
I’ll setup a machine soon if it can help for some GPU power in the training process (probably 2 * 1080ti). GPUs will also be used for a personal ultrasound project. I will probably talk about it a bit more in the forums in the next few weeks.


Great! I’m expecting to receive the data this week - although I’ll need to focus on getting the MOOC done first…

@jeremy I would love to participate, i hope it’s fine that i didn’t take part in the course(s). I have my own box (1080) and i practice DL - (we’ve actually met in the study group prior to the course). It’s an amazing cause and i’m in favour of any objective diagnosis / tests.

P.S. I absolutely look up to @rachel and Yourself. Keep up the amazing work on the course [and the community aspect].

P.S. P.S.
Maybe it’s an overkill and not needed (you know better on this problem and dataset vs. accuracy than anyone else), but maybe trying to implement:
can also squeeze more % but clearly save a lot of training time and provide more FLOPS [and it’s the best imagenet model].
I really liked the incorporation of the so needed attention mechanism

Of course! The data just arrived. :slight_smile:

I don’t know how well attention-resnet will work on 3d or very large images - maybe you could try it out on the kaggle competition data or LUNA?


You’re right. Good points. No clue - have no experience with 3d / very large images. That paper just came out, i’m not aware of an implementation of it (if there is one, i would gladly try). I suggested it because of it’s benchmarks, speed and i found their way of implementing attention and comparisons to other attentions to be useful. But you’re right, have no clue regarding those. Their “Mixed attention” is a function of the pixels and channels so if i got it right, it should work.

How does Resnet work with such data? 3d + very large?

One of the kaggle lung comp 2nd place winners used it. So it works well, apparently!

1 Like

I would love to participate. In the past life I wrote image reconstruction algorithms for PET and fMRI data. But I am not sure how much that’ll be useful. But prospects being part of creating an automated system for multilabel pathology identification using X-rays sounds very exciting.

Most of the state of the art of solutions for parsing medical images are based on ML techniques. But scanning high dimensional parametric spaces is probably better and more efficient through DL.

Write-up by the 2nd place winner of the 2017 National Data Science Bowl (lung cancer prediction from CT scans):

and code is available here.