Bone X-Ray Deep Learning Dataset and Competition

alexandrecc · May 24, 2018, 6:11pm

Stanford just released publicly a large bone xray dataset. There is also a related competition to determine if a study is normal or abnormal. I hope some of you will be interested to participate !

https://stanfordmlgroup.github.io/competitions/mura/

jeremy · May 24, 2018, 8:48pm

Do you think the issue of many of the images with positive labels showing hardware (eg screws and stuff) in the images is going to be a problem?

alexandrecc · May 25, 2018, 12:41am

That was a problem with the CheXray dataset when we were trying to classifiy a pathological sign (e.g. pneumothorax) but the model was converging on more easily detected but statistically related features (e.g. chest tube).

It could also be a serious problem if we try to classify specifically fractures. The model could converge on therapeutic hardware instead of really identifying the fracture.

But the proposed metric is based on normal vs abnormal result which can be useful for triage. In that case, if the model extracts meaningful features for a normal representation, it is less important to know why it is abnormal (fracture vs orthopedic hardware). But if the model is trained with almost only abnormal cases with orthopedic hardware, the bias will be integrated to the model and it will not be able to detect initial fractures. The most important and prevalent use case for these types of bone xrays is to detect fractures on the first, initial, radiograph. For a potential useful clinical application, it would consequently be a lot more useful to train the model on normal cases vs initial, non treated, fractures cases (without orthopedic hardware).

melonkernel · May 28, 2018, 3:38am

One needs to take into consideration that a study consists of several images, and that you evaluate a study and not individual images. A study can have 1-n images.

Your program should output binary predictions for every study in the input file (not every image) From the submission instructions

melonkernel · May 28, 2018, 4:30am

It seems that, according to the paper, the hardware is considered an abnormality

2.3 Abnormality Analysis
To investigate the types of abnormalities present in the dataset, we reviewed the radiologist reports
to manually label 100 abnormal studies with the abnormality finding: 53 studies were labeled with
fractures, 48 with hardware, 35 with degenerative joint diseases, and 29 with other miscellaneous
abnormalities, including lesions and subluxations.

alexandrecc · May 28, 2018, 11:49pm

That is an extremely important point that you underline @melonkernel.

The paper is using a simple average of the different predictions coming from the different images. This is an easy but likely underfitting ensemble approach to the problem.

Hint : Any radiologist with a minimum of experience is definitely correlating the 3D validity of each image “features” in the spatial domain to improve its own performance accuracy.

Beyond training a very good 2D deep CNN classifier, this potential 3D correlation is, imho, needed to create the most accurate model. Applying a random tree on the high level extracted semantic features from all images is interesting but would likely miss most of the fine-grain spatial correlation. A random tree combining the mid-level and high-level features of all the images could worth a try but the number of features would be incredibly high with a most probable overfitting problem. Capsule networks probably inherently hide a gem somewhere for this 3D correlation by encoding the spatial relationship between features; I’m thinking about this for months but unfortunately without any practical mature solution …

@Judywawira @rikiya : still interested to create a team ?

rikiya · May 29, 2018, 7:00pm

Thanks @alexandrecc, I’m very interested in this

As you mentioned, handling multi-view images would be one of key challenges in this competition. Weighted average based on output probabilities would be one idea to try, I mean something like putting higher weights on probabilities close to 1 (and/or 0) (kind of taking uncertainty into account), but unfortunately I don’t have neat solutions to this.

binga · May 29, 2018, 8:28pm

Hi @alexandrecc and team,

I’d be interested in working on this problem with a team like yours. Are you looking for another helping hand for this challenge?

Best,
Phani.

melonkernel · May 29, 2018, 9:45pm

I would be interested to form a team.
Although this is the first time i try out x-ray images. (It resonates with me very well. Using AI to help people is the reason I am into it)

I have been thinking about a couple of options, one might combine them perhaps in the end as an ensemble of sorts.

Option 1
Since x-ray are grayscale, you would not need an RGB tensor, but i am thinking one could combine all the views (images) into one tensor. One problem is that there are different amounts of images per study. Also, some of the x rays were white while others were black, so perhaps one would need to normalize these by inverting.

Option 2
Many to one classification, RNN or equivalent.

Option 3
Averaging the results of each image as in the paper.

Option 4
Adding Embeddings for the extremity type, wrist, shoulder etc…
Although this might be picked up anyway by the network, i am not sure if it is needed.

Option 5.
Averaging 1-4 to give final result

tell me if this doesn’t make sense.

@alexandrecc, with 3D correlation, do you mean that if there is a probable abnormality in let’s say index finger middle joint, on image 1, if image 2 also has a probable abnormality is in the same place (from a different angle) it would consider that abnormality to have higher importance. Or do you mean that in ones mind you create “layers” in 3D from the 2D image

Judywawira · May 31, 2018, 12:54am

Still interested @alexandrecc look forward to meeting in person on Friday

AnumulaMuralidhar · July 12, 2018, 6:37am

Is there anyone working on this problem?

alexandrecc · July 16, 2018, 1:48am

Yes, we currently have a relatively large group working on this problem. @jeremy

AnumulaMuralidhar · July 16, 2018, 4:49am

how can i join in this group @alexandrecc

pierreguillou · February 15, 2019, 6:26pm

Hello @alexandrecc. Did you download the MURA dataset? The online form is not working. How to get the dataset?

alexandrecc · February 19, 2019, 1:14am

Hi @pierreguillou ,

Yes, I got the dataset since last year. I guess you can contact the Stanford team if the online form isn’t working. The research agreement doesn`t allow transfer of their dataset between individuals.

pierreguillou · February 24, 2019, 11:20pm

Thanks Alexandre. I sent an email to the Standford team and I’m waiting for its answer.

[ EDIT ] : I received the email from ML Stanford and downloaded the MURA database

pierreguillou · March 20, 2019, 12:57am

Hi. I just published my medium post + jupyter notebook about the MURA competition.

My goal was to assess how far the standard fastai method could go in the search for better accuracy/kappa in the radiology domain and without any knowledge in radiology.

However, to go beyond a kappa of 0.642 (my score with the standard fastai method), I think that I need a more complete understanding of the field of radiology and more DL experiments.

Feedbacks welcome!

matejthetree · March 23, 2019, 4:53pm

Excellent work. I would be very interested in someone from experts sharing some advanced techniques and optimization on your notebook.

pierreguillou · March 25, 2019, 2:57pm

Part 2 of my journey in Deep Learning for medical images with the fastai framework on the MURA dataset.

I got a better kappa score but I need radiologists to go even further (and fastai specialists too ).
Please, feel free to use (and improve) my notebook (ensemble models, squeezenet models, etc.).

pierreguillou · March 25, 2019, 3:00pm

Thank you @matejthetree. I just posted the part 2 of my research on the MURA dataset.
Feedback welcome to go further