Bone X-Ray Deep Learning Dataset and Competition


(Alexandre Cadrin-Chênevert) #1

Stanford just released publicly a large bone xray dataset. There is also a related competition to determine if a study is normal or abnormal. I hope some of you will be interested to participate !

https://stanfordmlgroup.github.io/competitions/mura/


(Jeremy Howard) #2

Do you think the issue of many of the images with positive labels showing hardware (eg screws and stuff) in the images is going to be a problem?


(Alexandre Cadrin-Chênevert) #3

That was a problem with the CheXray dataset when we were trying to classifiy a pathological sign (e.g. pneumothorax) but the model was converging on more easily detected but statistically related features (e.g. chest tube).

It could also be a serious problem if we try to classify specifically fractures. The model could converge on therapeutic hardware instead of really identifying the fracture.

But the proposed metric is based on normal vs abnormal result which can be useful for triage. In that case, if the model extracts meaningful features for a normal representation, it is less important to know why it is abnormal (fracture vs orthopedic hardware). But if the model is trained with almost only abnormal cases with orthopedic hardware, the bias will be integrated to the model and it will not be able to detect initial fractures. The most important and prevalent use case for these types of bone xrays is to detect fractures on the first, initial, radiograph. For a potential useful clinical application, it would consequently be a lot more useful to train the model on normal cases vs initial, non treated, fractures cases (without orthopedic hardware).


(Christoffer Björkskog) #4

One needs to take into consideration that a study consists of several images, and that you evaluate a study and not individual images. A study can have 1-n images.

Your program should output binary predictions for every study in the input file (not every image) From the submission instructions


(Christoffer Björkskog) #5

It seems that, according to the paper, the hardware is considered an abnormality

2.3 Abnormality Analysis
To investigate the types of abnormalities present in the dataset, we reviewed the radiologist reports
to manually label 100 abnormal studies with the abnormality finding: 53 studies were labeled with
fractures, 48 with hardware, 35 with degenerative joint diseases, and 29 with other miscellaneous
abnormalities, including lesions and subluxations.


(Alexandre Cadrin-Chênevert) #6

That is an extremely important point that you underline @melonkernel.

The paper is using a simple average of the different predictions coming from the different images. This is an easy but likely underfitting ensemble approach to the problem.

Hint : Any radiologist with a minimum of experience is definitely correlating the 3D validity of each image “features” in the spatial domain to improve its own performance accuracy.

Beyond training a very good 2D deep CNN classifier, this potential 3D correlation is, imho, needed to create the most accurate model. Applying a random tree on the high level extracted semantic features from all images is interesting but would likely miss most of the fine-grain spatial correlation. A random tree combining the mid-level and high-level features of all the images could worth a try but the number of features would be incredibly high with a most probable overfitting problem. Capsule networks probably inherently hide a gem somewhere for this 3D correlation by encoding the spatial relationship between features; I’m thinking about this for months but unfortunately without any practical mature solution …

@Judywawira @rikiya : still interested to create a team ?


(Rikiya Yamashita) #7

Thanks @alexandrecc, I’m very interested in this :wink:

As you mentioned, handling multi-view images would be one of key challenges in this competition. Weighted average based on output probabilities would be one idea to try, I mean something like putting higher weights on probabilities close to 1 (and/or 0) (kind of taking uncertainty into account), but unfortunately I don’t have neat solutions to this.


(Phani Srikanth) #8

Hi @alexandrecc and team,

I’d be interested in working on this problem with a team like yours. Are you looking for another helping hand for this challenge?

Best,
Phani.


(Christoffer Björkskog) #9

I would be interested to form a team.
Although this is the first time i try out x-ray images. (It resonates with me very well. Using AI to help people is the reason I am into it)

I have been thinking about a couple of options, one might combine them perhaps in the end as an ensemble of sorts.

Option 1
Since x-ray are grayscale, you would not need an RGB tensor, but i am thinking one could combine all the views (images) into one tensor. One problem is that there are different amounts of images per study. Also, some of the x rays were white while others were black, so perhaps one would need to normalize these by inverting.

Option 2
Many to one classification, RNN or equivalent.

Option 3
Averaging the results of each image as in the paper.

Option 4
Adding Embeddings for the extremity type, wrist, shoulder etc…
Although this might be picked up anyway by the network, i am not sure if it is needed.

Option 5.
Averaging 1-4 to give final result

tell me if this doesn’t make sense.

@alexandrecc, with 3D correlation, do you mean that if there is a probable abnormality in let’s say index finger middle joint, on image 1, if image 2 also has a probable abnormality is in the same place (from a different angle) it would consider that abnormality to have higher importance. Or do you mean that in ones mind you create “layers” in 3D from the 2D image


#10

Still interested @alexandrecc look forward to meeting in person on Friday


(Anumula Muralidhar) #11

Is there anyone working on this problem?


(Alexandre Cadrin-Chênevert) #12

Yes, we currently have a relatively large group working on this problem. @jeremy


(Anumula Muralidhar) #13

how can i join in this group @alexandrecc