I’ve been working toward implementing the original version of MRNet using the fastai library, which has been challenging to say the least…
I’ve developed a custom ItemList subclass that loads an entire stack of images from one series (e.g. sagittal) into an Image object (the fastai version, not PIL.Image). I’ve created a custom DataBunch on top of that.
With just these changes, I’m pretty close to having a semi-functional solution, but ran into an issue with creating a custom Callback - though I think I’m close to figuring out a fix.
Nice, Walter. Let me know when you’re ready to have another pair of eyes on your implementation.
I’ve done some research around architectures and other libraries, especially looking for something ready to apply with volumetric data taken from multiple planes/angles. I have a few hacky ideas that may be worth exploring, which we can discuss once we’ve got the data together and have replicated a basic version of the prior work.
Looking forward to making more progress next week.
@nswitanek: Check out my PR (below) to @LessW2020 's repo for my implementation of the original MRNet model. The only key functionality I’ve yet to implement is data augmentation. It’s a bit hacky, but you can use the classes added in this PR to get a trainable model with (at least most of) the Learner functionality from fastai.
Hey Walter - thanks for the commits. I merged them into the master with your latest so no worries.
I finished my work on activations (FTSwishPlus was the winnner) so hopefully that may be of use for us.
I also did some indirect research/reading that may be helpful - there are two recent papers on using new loss functions. The idea is the loss function enforces regularization between the classification categories and that improves both validation accuracy and also generalization:
1 - W-Loss:
2 - Centroid loss:
and specific to medical image segmentation:
And a way to check for ‘memorization’ in the CNN - by monitoring this, you can apparently achieve the optimal stopping point during training:
There is also a new convolution layer proposed by Microsoft Research that is designed to capture better generality:
Finally, I found the github code for a nice GAN setup for super-resolution:
Their paper was very slick - they showed that teaching a GAN to decompose images first, then learn to upscale produced SOTA results vs cycleGAN, etc.
I don’t know if super resolution is something that will help us but at least it’s there should we decided we need to add it into the processing.
I’m going to get the data setup on floydhub and after I finish some deadlines today can hopefully start actively coding/testing tomorrow and will start with Walter’s basic implementation initially, then run it with a simple XResNet for comparison.
I’ve done a first read through now, and plan to go through your implementation in more detail to understand it better and to suggest extensions and alternatives.
I’ve read through @neuradai’s code and the custom ItemList tutorial in the docs. Thanks again @neuradai for getting some working code together so quickly.
So we’re on the same page, here’s how I’m thinking about how we should implement the custom ItemBase and ItemList classes so we can take advantage of fastai’s datablock API. Please comment or edit.
The ItemBase corresponds to the “item” that has a label associated with it. In the case of MRNet, it is the knee that gets the label, and the knee has three sets of images associated with it, each set taken from a different plane.
So we might implement a custom ItemBase to collect all the images from all three planes for a single knee, or “case.” In order to be used with pre-trained 2d image classifiers, the subclass could have defaults to provide just a single Image, say the central slice from the sagittal series, recast from grayscale to RGB.
What do you think?
If our custom ItemBase corresponds to a single knee, then our custom ItemList corresponds to a list of knees. @neuradai’s current implementation of MR3DImageList appears to be a list of images from a single knee. The code is still useful, but I think it belongs in a different class, and that we need a KneeList or a CaseList subclass of ItemList so that we’re fitting bunches of knees, not bunches of images from a single knee.
Please correct me if I’ve misread the code.
Do we agree that ItemBase should map to the images/scans of a single knee/case, and that ItemList maps to sets of knees/cases?
Also, FYI: I wrote the hosts of the MRNet competition and learned that there is no deadline to the competition. They intend to leave it as an open competition.
I think this is a great next step that would solve some of the problems with my implementation.
The MR3DImageList class I implemented is a little different than you described. Let me first establish terminology for clarity:
case: full set of image stacks (coronal, sagittal and axial) from a single knee
series: full stack of images from a single case in a single plane (coronal, sagittal or axial)
MR3DImageList takes a df of cases with labels. For each case, it loads one series (in my example, the sagittal series for each case) into an Image object (BaseItem subclass). The ImageList is thus comprised of a single series for each case. This is a bit hacky, because the methods of the Image subclass are really only designed to work with single images, not stacks of images - hence, why I’ve yet to implement data augmentation. I agree that a new BaseItem subclass would add some much-needed functionality.
However, I did it this way initially, because it was the quickest means of getting to some version of the original MRNet implementation, in which they trained:
One model for each plane (coronal, sagittal and axial) for each task (abnormal, ACL, MCL)
The models for each plane were ensembled for each task
The model architecture they implemented inputs each image from a series as an element of a minibatch. Their forward pass squeezes this “minibatch” into a 2D architecture and reduces the dimensionality of this input after the final AdaptiveAvgPool2D layer, as follows:
def forward(self, x):
# in the original code, the input was squeezed here, but this isn't sufficient for fastai
x = self.model.features(x)
x = self.gap(x).view(x.size(0), -1)
x = torch.max(x, 0, keepdim=True)[0] # combines 'data' from images in a series/minibatch
return torch.sigmoid(self.classifier(x))
This allows them to maintain a one-to-one correspondence between a given series and its label, despite treating the series as a minibatch with respect to input. This also explains why only a minibatch of size 1 is supported – because a minibatch of any other size couldn’t be squeezed into a 2D architecture.
Hopefully, that explanation makes some sense. In the end, I still agree that it would be better to create a new BaseItem subclass better capable of handling a series (or all 3 series) from one case and modify my code for MR3DImageList to better handle that class. And I’m also happy for us to abbreviate the name to something like KneeList.
Its very simple model but there is one beautiful aspect to it. If you look at data then each subject has different number of slices. I have seen in above post that @neuradai has solved this problem by padding on both sides of image stack with zero arrays to equal max_slc. However authors of used very different approach here.
Suppose one of the volume has 36 slices . Then they started with tensor with size of (torch.Size([36, 256, 256]).
Then they stacked each slice 3 times. This made data look like
torch.Size([36, 3, 256, 256])
Then they always fetched only one batch with data loader.so shape of their batch look like
(1,36,3, 256,256). In other words, they always fetched data for one whole volume at a time.
Now inside MrNet very first thing they did was
x = torch.squeeze(x, dim=0) making their dataset of size (36,3, 256,256), which can be fed to pertained AlexNet as batch size of 36.
In this way, they have automatically achieved solution for variable batch size.
I was trying to solve this problem of variable batch size for long time and found this solution very cool.
Yeah, it’s a really cool solution. Unfortunately, it doesn’t work with all of the built in tests in fastai, hence why I used zero-padding in my implementation.
Also, I haven’t yet trained it for more than one epoch due to the current lack of data augmentation.
Did you considered interpolating the images to have the same number of images? It may provide some data augmentation as a bonus. I would expect most knees to be of similar size and the difference to be mostly due to different slice thicknesses during acquisition.
from scipy import ndimage as nd
knee_img_original.shape #original numpy array with the mr series in one plane
#resize to 64x256x256
dsfactor = [w/float(f) for w,f in zip([64,256,256], knee_img_data.shape)]
knee_img_resized = nd.interpolation.zoom(knee_img_data, zoom=dsfactor)
@nswitanek, do you want to try this with your expansion of my work?
To the point about knees being the same size – in my review of outliers:
Some cases contain multiple “empty” slices (e.g. images within a series). These are sometimes black, but often noisy (snowstorm appearance).
There are many obese patients (this is the USA, after all) for whom there is a lot of extra skin and subcutaneous fat to cover in the scan.
Quote formatting utilized for emphasis:
Importantly, slice thickness should be the same on all of these scans because they come from the same institution and were all scanned with the same acquisition protocol.
Out of curiosity, I went back and tested my current implementation without zero-padding – and it worked!
In the original dev process, I incorporated zero-padding pretty early on, but it seems that the changes I made later are sufficient to get the true implementation of MRNet working in fastai.
Note: interpolation or “mirror” duplication would still be reasonable strategies to standardize slice numbers if we were to move to a 3D model arch.