Deep dive in to lung cancer diagnosis

(Maxim Pechyonkin) #61

I am writing to all who have data from Kaggle Data Science Bowl 2017. Can I ask you to join seeding on the official Kaggle torrent files if you still have got the data?

I would very much like to practice and learn by going through the top places solutions and looking at data but there are no seeders currently. I am based in China and direct download is just impossible with Chinese government’s censorship. It would be great if you could join in. Thanks!

(Davide Boschetto) #62

Just commenting to say I’m interested in eventually giving some help in the future! Bioengineer with a PhD in image analysis now working in deep learning, so this data is kind of the perfect match! I’ll send this to my academic connections, you never know!

(Kousik) #63

@jeremy ill join once im done with Part 1 v2 if thats alright :slight_smile:

(danny iskandar) #64

i am interested in this, thank you!

(Rob Harrand) #65

Hi all. The challenge of nodule detection seems to be a step beyond a ‘cats vs dogs’ type situaiton, where you might only be trying to tell the difference between healthy and unhealthy (any condition) CT scans. In that case, I’m guessing you need some sort of 3D CNN, as I believe CT scans are examined by radiologists taking into account the z-axis. I.e. they don’t take each slice at a time (like you might do with 2D radiographs), but instead scroll through and look for object behaviour in all 3 dimensions. Does that sound about right? Thanks.

(Alexandre Cadrin-Chênevert) #66

I am still potentially interested to collaborate with you @jeremy even if I currently have some others active projects. Everything started here last year for me.

(Divyansh Jha) #67

The project sounds interesting. I also want to be a part of it

(James Requa) #68

I just started part 1 v2 but would definitely be very interested in participating in this project!


I am also interested in joining the project when it kicks off. Meanwhile, I will prepare myself by going through the preprocessing jupyter notebook.

(Jeremy Howard) #70

If anyone gets through the preprocessing and starts training some models on the data science bowl competition data, please do let us know! That’s the starting point we need to get to before we can start making progress on this.

(Kevin Bird) #71

Do you have a link to the data science bowl competition? I am interested in looking into this at least and contributing if possible.


I am currently using Brad Kenstler’s notebook for preprocessing the images.

(segovia) #73

I am using this kernel to pre-processing the files and it works fine (with a little change as suggested in the comments). But I am not sure if I should get the segment_lung_mask for each image or not. I think the answer is No (because as long as I have the processed images in Numpy array format, I can start running CNN on them). Just want to confirm before moving on…

(Rikiya Yamashita) #74

I’m on part 1 v2 now, and would definitely be very much interested in this project, because I’m also a diagnostic radiologist :wink: and this particular competition was exactly what realized me the importance of learning DL more practically :smile:

(segovia) #75

Hi @jeremy, would you please clarify the purpose of this project again? I am very interested, but kinda feel I’m not sure what we want to achieve here.

If the purpose of this project is to use the large dataset you have to train a model and get good performance, then we can just replicate the winner’s model, is that correct? Or you are trying to develop better models with better structures here?

(David Gutman) #76

If you can’t beat’em, join’em. (I’m one too. :slight_smile: )

(Jeremy Howard) #77

I’d like to improve on the winner’s model, particularly by using more data. But I think replicating their model is a necessary first step, and also really understanding it.

(Rikiya Yamashita) #78

Sounds interesting :sunglasses:
Whom do you mention by “them”, “grt123” by any chance?

(nicole) #79

please count me in!

(Surya K) #80

@jeremy, @alexandrecc : I am definitely interested! any resources to better understand the domain, task and the technical terms?