Stanford MURA (X-Ray) Classification Competition

Hey, fellow fastai students, I am currently working on Stanford MURA Competition. It is basically an upper extremity X-Ray binary classification dataset. If you are interested, I would love to brainstorm with you.

Here are a few things I have discussed with other students about the competition:


  • The basic training unit is not an individual X-Ray image but an individual X-Ray study that could consist of multiple images, each representing a different view on the same body part. An image does not have a label while a study does.
  • The images inside the datasets have varying shapes, dimensions, and padding sizes. In a word, they are not close to being standardized.


  • Label each individual image according to the label of the study they belong to, then train the model using an individual image as basic training unit.
    • Pro: Easy to start. ImageDataBunch works out of the box. At inference time, we can just take in all the images of the study, then aggregate them to produce a final result.
    • Con: It could be the case that even in positive studies, not all the images look positive, i.e., abnormal. The body part might only abnormal from one perspective but perfectly normal in all others. So, it is technically impossible for the model to really tell that the individual image is abnormal. As a result, many of the training data will be mere noises.
  • Merge all the images in a study into one single image.
    • Pro: Easy to start. ImageDataBunch works out of the box. Inference time is trivial as well because now the basic training unit is correctly an individual study.
    • Challenges: What would be a good way to merge the images together?
  • Build a custom architecture that actually takes in multiple images as input.
    • Challenge
      • Architectural design
      • Studies have varying numbers of images

General Strategies

  • You can train an end-to-end deep learning system that simply takes in an X-Ray image from an arbitrary body part and tells if it is abnormal.
  • Since in inference time, the body-part of the image is given, we can train individual CNN models for each body part.


The leaderboard uses Cohen’s Kappa Coefficient, which is a much stricter metrics than plain accuracy. An accuracy of 0.825 could only give you a kappa of 62.6.

Kappa is implemented in fastai:

Kaggle Dataset

I just uploaded the dataset to Kaggle for your convenience. However, since you actually need to sign a document online to get the dataset, I decided not to make it public so that Stanford won’t sue me :thinking: I am kidding :rofl: I know they won’t but just in case.

As a result, now I need to invite you to the dataset in order for you to get access to it. Please reply me with your Kaggle username so I can send the invitations.


Thanks @ PegasusWithoutWinds I am very much interested in brainstorming and collaborating with you in this competition.

Hey, @rsrivastava, what time zone are you in? We can have an audio call to get it kickstarted.

I am in PST timezone. We can do whatsapp call.

I am also interested, and would like to participate. I am also in the PST. I can bring some domain expertise as a musculoskeletal radiologist.

Oh, that would be wonderful! We are dying for a domain expert. What would be a good time for us to give you an update on the current status of the work?

I am available most evening 6-10pm PST.

@rsrivastava Would you like to join?

@PegasusWithoutWinds and @agentili Yes I would like to join. When are we meeting and how.

I’m interested too.

Ah, you are most welcomed to join! How could we reach you?

I will love to contribute if somehow I can help. I don’t have much domain expertise in this field. But I have some knowledge of deep learning.

I am interested, how can I join you?

So glad to see all the enthusiasm out there! Here is a Google Hangout invitation link.

We can use it to start our meeting at 8:30 pm on 03/17 PST. Let me know if Google Hangout does not work for you.

1 Like

I’m interested too. I’ve worked on a generic “multi image input” DataBunch for the “human protein atlas” competition.
It should be tuned to accept RGB images instead of GRAY scale and extended to support “missing” images in the case that not all “views” are present.


Ah, this is great!

Here is some more note I take for myself when playing around with the competition. However, I cannot guarantee its readability as originally I did not intend it to be widely readable to others. You are still welcomed to read it if you find it interesting.

Stanford MURA Competition.pdf (194.7 KB)

1 Like

I’ve updated the code to work with 1.0.50.dev0 .

1 Like

I am already here in the Hangout.

As a reminder, the link to join is:

Just in case any of you are interested in building the whole thing from raw PyTorch, here is a repo that might serve as a starting point: