Deep dive in to lung cancer diagnosis

thanks for this wonderful intro to the problem and the domain!

@alexandrecc this is such great information - is there any chance you might consider copying it into a medium post? If not, do you mind if I turn it into a post later?

1 Like

@jeremy Count me in, I’m very interested.

Yes sure, let me find some images to improve the format and I’ll let you know when I post it on medium. I’ll try to do this before leaving for the RSNA on saturday.

Hi @alexandrecc,

Nice intro from radiologists perspective!
BTW, I’m coming to RSNA too. It would be very nice to meet up with you and say hello in person. If you don’t mind, please let me know :wink: Let’s enjoy RSNA 2017 :smile:

Blog post inspired by previous forum post available here:

7 Likes

Very good post! I’m very interested and half way through the processing steps!

I cloned the grt123 team’s repo and finished the preprocessing. Still trying to understand the details of each piece of code. Kinda overwhelmed by the complexity of their solution…

Learning Pytorch in the meantime.

2 Likes

I recently read more about this collaborative project that looks promising if someone from fast.ai wants to get involved:
https://concepttoclinic.drivendata.org/

The project could have very high impact if it works as planned. @kmader posted the link to the public github repository earlier in this thread on sept 16 : Deep dive in to lung cancer diagnosis

They are also using grt123 solution.

2 Likes

I have a question to experts in this field. I am currently walking through Brad Kenstler’s notebook. The images I am working on right now has modality : MR. I’ve tried to look online but couldn’t find a good explanation for mapping MR intensities -> Hounsefield Units for visualization purposes. I would appreciate very much if someone can help me out. Maybe I am searching for the wrong thing. Thanks !

And I suspect the interpretation of intensities of CT and MR are probably different which makes it hard to skip visualization part of the notebook and continue with the actual preprocessing. Especially designating ROI thresholds probably differ by different organs (lung to brain) and different machines (CT to MR).

If you ask on twitter and at-mention me I’ll ask the rad community if they can help. cc @Judywawira @alexandrecc @davecg

Hounsfield is just CT.

Depending on the MRI sequence it might have intrinsic meaning (eg ADC, quantitative flow sequences, some perfusion metrics), but even those will vary from scanner to scanner.

Demeaning and dividing by std for the volume should be a reasonable way to start, but you should check to make sure even/odd slices aren’t very different (MRI series are sometimes collected “interleaved” and on some scans you will notice alternating intensity levels). Normalizing by slice might avoid this problem, but slices that are nearly empty will be normalized very differently than slices with a lot of tissue (you can see this when viewing images on many PACS systems).

You also usually need bias correction using a tool like N4 (http://stnava.github.io/ANTs/) for research workflows and motion correction if you have time series data (eg MCFLIRT from FSL, basically just rigid registration across timepoints).

Some of these tools might not be necessary for deep learning models, and many others could stand to be updated to use the GPU.

2 Likes

These are all really valuable information, thank you so much. We currently have MRI scans for around 300 meningioma patient. Each MRI scan is 124x512x512 (slice, height, width). Our first task is to come up with a model that can auto-contour meningioma tumor. Since we have raw data we will probably do a lot of preprocessing before feeding it into neural nets, such as normalization, skull removal, and others that might be helpful for the task.

I appreciate your help and if you don’t mind may I ask for help for this thread Lung cancer detection; Convolution features + Gradient boost as well. Thanks in advance.

I’m in the same task.Follow you yet!

Very interested in this topic. I´m part of Deep Learning Brasilia(Brazil) group and we´re on lesson 6 - part 1.
Jeremy, congratulations for this initiative. And thanks for the oportunity of Fastai Deep Learning course. You´re THE GUY!

I am really interested in this topic! Is it too late to help out?

Is this dataset different from the one hosted at drivenbydata.org?

I am very enthusiastic about this topic as well and would greatly appreciate to hear an update from Jeremy, since it sounds like the data has been available for more than a year now.

Our group at the University of Basel is currently working on lung cancer diagnosis (based on CT scans and reports) together with the radiology department of the university hospital and I will draw their attention to this opportunity (and kindly ask the radiologists for their help in labelling).

Further, I’d like to point out that it’s very unfortunate that the data from the Kaggle 2017 Data Science Bowl is not available anymore. In this regard, it would be extremely helpful if at least a sample of the NLST data were provided, to get people started with the preprocessing and replication of models.

Hi,

I’m currently exploring the CT scan challenge on Kaggle. I’d love to know where this thread went. Did anyone have any success? Did anyone produce a fastai based CT scan example?

Hey I work in a medical A.I lab at a university and one of our projects is this very topic. Are you still open for collaborations?