I’m trying to classify breast density (4 class classification) based on Mammograms. I have about only 6000 images in train set.
My currently training a CNN with 512x512 input image (Resized high res mammo to 512x512 PNG). I obtain an overall accuracy of ~65% and nearly the same mean per class accuracy on validation set.
Is the dataset too small? What would be the ideal dataset size?
What kind of data augmentation could I use in case of mammograms, keeping in mind that the task is breast density classification? I’ve tried random flipping which helps a bit.
What are some other things I can try to improve performance?
Breast density classification from a human perspective doesn’t need much resolution. In the medical literature, there is usually bad radiologist interrater agreement especially for the boundary between category 2 and 3 so it will be hard to get a > 90% accuracy model with that kind of significant ground-truth noise in the labels.
The standard resolution of the classic Imagenet pretrained networks is probably enough (224x224 or 299x299). Lower resolution could even be a regularizer for this problem. I would add a lot of data augmentation (v-h flips, 0-45 degree rotation, v-h shift, 0.8-1.2 scale). Try the finetune first approach with imagenet weights and then train the entire network with smaller LR. Use batch norm layers with 32-64 batch size. I don’t think the type of optimizer will make a significant difference for this problem.
Is it right to do this, since the image properties of imagenet and mammograms are different?
That is a good question. I would say : try it and see if it helps. The most simple way is to just copy the single channel (BW image) to all 3 channels (RGB image). In many medical imaging problems, there are different ways to benefit from imagenet weights (frequently to improve speed of convergence, sometimes to improve generalization and final performance).
It is looking good but I would not add that much translation (maybe 0.1, 0.1) because you don’t want the breast parenchyma to be out of your image to keep the gradients meaningful to create a good representation of data in the latent layers.
On the same topic, it makes more sense to formulate this as a regression problem than a classification problem. Could I just map class (a) to 1, (b) to 2… so on, and regress? Or is there a better way to do this?
Yes it makes sense. You could even train a classifier and a regressor; and average both results. Frequently, the gradients propagate differently when the backprop starts from a softmax end layer and a sigmoid end layer. This can lower the correlation between models and get even better results by averaging.