Of course lung cancer is a very wide subject. In a very simple form, detecting lung nodules/cancers when they are still small can save lifes. When the nodule/cancer is asymptomatic, we call this screening imaging.
You can read an interesting review about lung nodule screening here:
Lung-Rads is a screening reporting and data system defined by ACR (American College of Radiology):
Summary table : https://www.acr.org/~/media/ACR/Documents/PDF/QualitySafety/Resources/LungRADS/AssessmentCategories.pdf
If we focus on incidentally detected lung nodules on CT scan the most important and up to date clinical publication is:
Fleischner guidelines 2017 : http://pubs.rsna.org/doi/abs/10.1148/radiol.2017161659 (msg me for some help to get the .pdf)
Summary table of Fleischner 2017 : http://www.nucsradiology.com/fleischner-society-2017-guidelines/
Initial 2005 publication to introduce the concepts : http://pubs.rsna.org/doi/pdf/10.1148/radiol.2372041887
The historical usage scenario for this problem was consequently to develop an automated tool to detect the nodules on CT scan with a good sensitivity and specificity (high test accuracy, or high area under ROC curve). We commonly call these tools CAD (computer assisted detection). Unfortunately, classical CADs have high sensitivity but low specificity; consequently, they are not seriously used in high volume practice. You can see a review of this subject here:
Luna Challenge focused on 1) Detection of nodules and 2) False Positive reduction (e.g higher specificity):
Kaggle 2017 Data science bowl goal (https://www.kaggle.com/c/data-science-bowl-2017) was to create an automated method capable of determining whether or not the patient will be diagnosed with lung cancer within one year of the date the scan was taken. Of course the winning methods were analyzing nodules/masses on the dataset. Unfortunately for that competition, training labels weren’t localized on the 3D data. The task was challenging but not that useful from a clinical perspective. Maybe that was a commercially oriented challenge to find good prospects (developers and models) for the development of a real training/application.
I think the main historical usage scenario is still important. To automatically detect nodules on CT-scan and select the one(s) with the highest probability of cancer based on all the features. With strong evidence of performance, this tool could completely change the current practice (detecting smaller worrisome nodules, less followup for larger nodules with low probability of cancer). As I already said in a previous post, the most important size range for a deep learning application is between 5 to 10 mm on ct scan. Under 5 mm, our technology (biopsy or PET-CT) can’t confirm a cancer that small and surgeons (currently) won’t treat a patient from a probability of cancer determined by a deep learning model (no matter the performance). Above 10 mm, usually the diagnosis is relatively straightforward with the current technology.
An open source project with high test performance could have high impact for low development index countries to implement CT screening programs at a very low cost. Of course, this means the population has at least minimal access to a CT scan. There is an open source project for lung cancer screening with chest X ray : https://aiai.care/
But unfortunately, detecting small lung cancers from a chest X ray is not very efficient from a human perspective.
An open source project could also have high impact for high development index countries to lower the cost of CT lung screening and deliver it universally to the entire population. For example, in Canada, with a universal public system, starting a country wide screening program would need a lot more radiologists with a very high program cost.
I agree with @jeremy that replicating the winning data science entry is a good start. I still offer to volunteer as a radiologist (e.g. labeling and localization of nodules) if this is focused as an open source project. A potential model validation compared with many different radiologists is eventually also quite important to get enough credibility to be applicable. I could also help on that side if needed.
I hope this helps.