Deep dive in to lung cancer diagnosis

maxim.pechyonkin · October 24, 2017, 4:22am

I am writing to all who have data from Kaggle Data Science Bowl 2017. Can I ask you to join seeding on the official Kaggle torrent files if you still have got the data?

I would very much like to practice and learn by going through the top places solutions and looking at data but there are no seeders currently. I am based in China and direct download is just impossible with Chinese government’s censorship. It would be great if you could join in. Thanks!

DavideBoschetto · October 24, 2017, 6:48am

Just commenting to say I’m interested in eventually giving some help in the future! Bioengineer with a PhD in image analysis now working in deep learning, so this data is kind of the perfect match! I’ll send this to my academic connections, you never know!

codeck · October 24, 2017, 6:56am

@jeremy ill join once im done with Part 1 v2 if thats alright

diskandar · October 24, 2017, 9:40am

i am interested in this, thank you!

tentotheminus9 · October 24, 2017, 12:49pm

Hi all. The challenge of nodule detection seems to be a step beyond a ‘cats vs dogs’ type situaiton, where you might only be trying to tell the difference between healthy and unhealthy (any condition) CT scans. In that case, I’m guessing you need some sort of 3D CNN, as I believe CT scans are examined by radiologists taking into account the z-axis. I.e. they don’t take each slice at a time (like you might do with 2D radiographs), but instead scroll through and look for object behaviour in all 3 dimensions. Does that sound about right? Thanks.

alexandrecc · October 24, 2017, 1:56pm

I am still potentially interested to collaborate with you @jeremy even if I currently have some others active projects. Everything started here last year for me.

divyansh · October 24, 2017, 3:46pm

The project sounds interesting. I also want to be a part of it

jamesrequa · November 5, 2017, 5:07am

I just started part 1 v2 but would definitely be very interested in participating in this project!

ar_ai · November 5, 2017, 5:22am

I am also interested in joining the project when it kicks off. Meanwhile, I will prepare myself by going through the preprocessing jupyter notebook.

jeremy · November 19, 2017, 3:00am

If anyone gets through the preprocessing and starts training some models on the data science bowl competition data, please do let us know! That’s the starting point we need to get to before we can start making progress on this.

KevinB · November 19, 2017, 4:00am

Do you have a link to the data science bowl competition? I am interested in looking into this at least and contributing if possible.

ar_ai · November 19, 2017, 4:05am

I am currently using Brad Kenstler’s notebook
http://nbviewer.jupyter.org/github/bckenstler/dsb17-walkthrough/blob/master/Part%201.%20DSB17%20Preprocessing.ipynb for preprocessing the images.

shushi2000 · November 20, 2017, 11:52pm

I am using this kernel to pre-processing the files and it works fine (with a little change as suggested in the comments). But I am not sure if I should get the segment_lung_mask for each image or not. I think the answer is No (because as long as I have the processed images in Numpy array format, I can start running CNN on them). Just want to confirm before moving on…

rikiya · November 21, 2017, 7:38am

I’m on part 1 v2 now, and would definitely be very much interested in this project, because I’m also a diagnostic radiologist and this particular competition was exactly what realized me the importance of learning DL more practically

shushi2000 · November 21, 2017, 7:21pm

Hi @jeremy, would you please clarify the purpose of this project again? I am very interested, but kinda feel I’m not sure what we want to achieve here.

If the purpose of this project is to use the large dataset you have to train a model and get good performance, then we can just replicate the winner’s model, is that correct? Or you are trying to develop better models with better structures here?

davecg · November 22, 2017, 1:11am

If you can’t beat’em, join’em. (I’m one too. )

jeremy · November 22, 2017, 1:15am

I’d like to improve on the winner’s model, particularly by using more data. But I think replicating their model is a necessary first step, and also really understanding it.

rikiya · November 22, 2017, 5:11am

Sounds interesting
Whom do you mention by “them”, “grt123” by any chance?

nicole.bussola · November 22, 2017, 3:57pm

please count me in!

suryatk · November 22, 2017, 10:01pm

@jeremy, @alexandrecc : I am definitely interested! any resources to better understand the domain, task and the technical terms?