Stanford MURA (X-Ray) Classification Competition

Hello @agentili. I understand but if all x-rays have the same border (same size, same color, etc.), the convnet learns not to use it to classify images, no?

And if the border is not exactly the same for all images (in particular, in relation to size), how can you delete it in pre-processing without any risk of deleting useful information?

(sorry for naive questions but as you can understand, I’m not specialist in radiology :slight_smile: )

Hi @melonkernel. Great :slight_smile:
Did you post a notebook or can you share your methodology with the previous fastai library?

Check out the paper I just posted here.

Sure.
I don’t think it was a super good approach, but i was experimenting how to deal with the fact that you have an arbitrary amount of images per study.
What i did was to concatenate all the images in one study into one combined image, as a convnet will not really mind where on the image a feature appears. This way one image could trigger several different activations. But you will have problems with data augmentation (rotating for instance will be less relevant if you do that on a an images containing several images… however now that i think of it, one could try to concatenate after having done the transforms on the individual images)
I also though about doing some RNN alternative or multi-input model where you could have n amount of inputs but I haven’t experimented with that yet.
I will some code when i have cleaned it a bit.

I am working on a different project that has black colored background with multiple colored spectrogram image patches…

I had the same observation like yours… I have noticed that a lot of the interpretation of predictions heatmap are mainly on the background… Even when I’ve assigned the train dataset instead of the validation in the interpret method, I’ve observed that ~20% of the images do have a lot of the heatmaps activation on the background instead on the spectrograms, especially on the corners and edges of the images like this:

I have experimented with different types of backgrounds to see its effect on this heatmaps and on the accuracy. And it seems the background color has indeed an effect on that… Accuracy is not bad… It is around 75-85%… The color of the background and whether it is randomly chosen or fixed for each training image has aan effect of ~10% on accuracy…

From all what I have tried, it seems the black background had the least such effect and with the best accuracy… Indeed this issue of being the heatmaps mainly on the background and the accuracy is related… This can be someway giving a clue how is your model doing… The best would be when no heatmaps happens on the background…

Here are my trials on the type of backgrounds:
Black
White
grey (127,127,127)
Random grey for each image in the training set
random colors (rand R, rand G, rand B)

And indeed cropping the images with only minimal background area seems doing better…

So I suggest doing a Bounding Box crops something like Radek’s notebook

I wonder what does that mean when the heatmap of the “top losses” images is often outside the body part being imaged and even when we plot the heatmaps of the most confident predicitons:

interp.plot_top_losses(42,largest=False)

is affected
and that is whether

interp = learn.interpret(DatasetType.Valid)

or

interp_train = learn.interpret(DatasetType.Train)

I have observed that many correctly classified images have this issue too… So it is strange why heatmaps are mainly outside the object of interest…

Is it related to vanishing/exploding gradients?

But isn’t it that, if the background is black in a certain area for all the training set and all classes, then the weights of that perceptive field’s neurons should all be zeros? How could be the heatmaps so high on those fixed black colored areas?

and what can we do about it to avoid such issues since it is closely related to the model accuracy?

3 Likes

Hi Pierre, do you remove the Words rectangle on MURA? Would you like to share the way? Thanks.

Hello, please I need access to the dataset on kaggle if it’s still available, I am working on a college project involving this dataset. My username on kaggle is arwamohammed

Kaggle username: yashbhojwani
I would like to join