MRNet: Stanford Knee MRI Dataset And Competition

neuradai · April 18, 2019, 6:41pm

Discourse has been warning me that I’ve been posting too much , so I’ve decided to use a Medium post as a sort of “lab notebook” for my data exploration (with included domain knowledge).

Update: My post was published by Towards Data Science. I’ve changed the link below to the friends link, so FastAI folks can bypass the paywall, if needed. Please don’t share this link beyond these forums, though.

For discussion of model architectures and the like, I’ll keep most of my responses here, rather than include that info in the post. Don’t want to give any of our competitors too much of an advantage…

nswitanek · April 18, 2019, 7:24pm

Thanks, @neuradai.

I’m extending your EDA nb a little. Working from a subsample I see that the number of images per sequence varies across cases:

Is such variation expected? Why does it occur? Is the middle slice guaranteed to be centered in the same place across patients?

nswitanek · April 18, 2019, 7:31pm

Note that there are five distinct classes of cases:

The cases that are considered Abnormal but are without either ACL or Meniscus tear are the most common category. ACL tears without a Meniscus tear is the least commonly occurring condition in the training sample.

neuradai · April 18, 2019, 7:35pm

That is expected, due to differences in patient size and the orientation of their leg during the scan. If you’re processing a batch of full sequences, you’ll need to pad the sequences for each plane with 256 x 256 zero arrays - probably on either side.

There are no guarantees in medical imaging, so the mid-slice is not guaranteed to be centered on the knee joint in all planes for all patients. However, it is standard practice for the MR technologist to attempt this when acquiring the images in the coronal and sagittal planes. Furthermore, given the location of the structures of interest (ACL, medial and lateral menisci), it’s not guaranteed that we’ll see all of the structures in any one slice - let alone the middle slice for that sequence.

Also, note the “bad” data points section of my Medium post, where there are some images that were either poorly cropped in the preprocessing by the Stanford ML Group or technologist error in acquiring them.

I’ve added this bit of code to look at the minimum number of slices (centered around the middle slice) needed to include all of the relevant structures. In my limited exploration thus far, the answer is ~ 16 - with the exception of “bad” data points. Conveniently, this would also keep stack sizes the same for all planes, if the stats for minimum slices hold from your screenshot, when applied to the entire data set.

def load_partial_stacks(case, data_path=train_path, slice_limit=None):
    x = {}
    planes = ['coronal', 'sagittal', 'axial']
    if not slice_limit:
        return load_stacks(case, data_path)
    else:
        for i, plane in enumerate(planes):
            data = load_one_stack(case, data_path, plane)
            if slice_limit >= data.shape[0]:
                x[plane] = data
            else:
                mid_slice = data.shape[0] // 2
                lower = mid_slice - (slice_limit // 2)
                upper = mid_slice + (slice_limit // 2)
                x[plane] = data[lower:upper, :, :]
    return x

neuradai · April 18, 2019, 7:38pm

That suggests that this is a fairly realistic data set. Knee MRIs aren’t typically ordered on asymptomatic patients. And the most common abnormalities seen are within the cartilage and underlying bone, not the ligaments and menisci.

For these reasons, I think an ensemble of separate models for abnormality, ACL and meniscus would be a wise approach.

Edit: Also, ACL tears are very frequently associated with meniscus tears, due to the mechanisms of trauma required to cause an ACL tear.

LessW2020 · April 18, 2019, 7:58pm

I think that’s a great approach. Let me finally reboot and see if I can actually get at the data at last…I spent an hour this morning with the new XResNet to better understand it so if i can get actual data I can setup some basic testing tonight.

nswitanek · April 18, 2019, 8:00pm

I hope everyone here, whether pursuing work with another team in parallel or not, will contribute to the common fastai-mrnet repo.

Also, merging notebooks can be challenging: Jeremy and Sylvain are fans of ReviewNB, a GitHub app that makes reviewing diffs in NBs much more user-friendly.

LessW2020 · April 18, 2019, 8:05pm

Great article in terms of helping understand domain details here…except this part:
" > In my experience, the sagittal fluid-sensitive sequence (T2 fat-sat) is the best of these three for identifying meniscal and ACL pathology."

Maybe add that after we win the competition

LessW2020 · April 18, 2019, 8:39pm

I’ve enabled ReviewNB on the repro!
https://app.reviewnb.com/lessw2020/mrnet-fastai

@nswitanek - thanks for highlighting this app. Should make keeping notebook merging a lot easier.

ronaldokun · April 18, 2019, 9:27pm

There is a python package jupytext which creates a text representation (py, R, .md etc…) in paralel with the notebook. You can edit the notebook or the text file in a IDE and both are updated. Even if you open the text file in Jupyter it is interpreted exactly like the original notebook.

I found it pretty awesome. Check it out https://github.com/mwouts/jupytext

The idea is to git control only the text file.

LessW2020 · April 20, 2019, 2:56am

I’ve spent the past day testing out the new activation function LiSHT/LightRelu. LightRelu (clamped and mean shifted LiSHT) performs the best in terms of smooth training curve, but so far ReLU still ends up better on ImageNette at least.
I was hoping this might give us an edge for the competition, but at this point it’s not proven out to ultimately beat out ReLU though it’s more stable along the way.

That said, I still don’t have the images yet as it won’t unzip…so I’ll have to login to FloydHub and try to get it there as it seems this db doesn’t work on Windows.

I did read another study where they showed that most of the preprocessing techniques didn’t help with MRI classification for finding and outlining the hippocampus…rather the attention focus was what helped the nn the most (this was for brain scans though and finding the hippocampus).

That made me think perhaps if we had one net that simply find the main area of the knee joint (not for axial but for side/front), and then cropped that automatically and fed it to the classification network, then that might be helpful since all the leg muscle/outer fat, etc. will have no info for the nn in terms of actual knee injury spotting.

I don’t have any experience with attention and nn though so maybe someone here does and can provide input?

neuradai · April 20, 2019, 4:31pm

Another thought that could make NB versioning a little easier…instead of having everyone working on the same NB, we could have people rename their individual NBs. E.g. @nswitanek could rename his MRNet_EDA_expanded.ipynb or MRNet_EDA_ns.ipynb. That way, if I want to clean up the original NB, then (a) I don’t have to wait for his PR to get merged into master before submitting my own PR or (b) @LessW2020 doesn’t have to futz around with conflict resolution between the 2 PRs.

FYI: I’m planning on submitting a PR soon that adds notebook2script.py from the course so we can export functions from different NBs for use in others.

LessW2020 · April 20, 2019, 5:27pm

Yes please! Let’s do the individual notebooks while we are in R&D mode

Related note - I’ve spent about 3 days now testing a variety of activation functions for image classification…I’ll make another thread with details but in order of performance:
1 - General Relu! (leak, mean shift and clamping)
2 - TFSwish - with mean shift and clamping
3 - LiSHT (LightRelu) - with mean shift and clamping
4 - ReLU

I’m making a more documented run now as I might write my first paper on it, but I’d highly recommend everyone use the General Relu we used in class to help speed things up/improve results.

neuradai · April 21, 2019, 4:20pm

This might be something worth considering for our collaboration once we start model training experiments.

https://link.medium.com/lIaLk4tv4V

nswitanek · April 22, 2019, 9:34pm

Any update on getting code from the multi-branch architecture paper?

nswitanek · April 22, 2019, 9:51pm

Good ideas! Have you been able to replicate and/or exceed the results of the original paper?

LessW2020 · April 23, 2019, 12:54am

I did hear back from the author of the first paper I posted here. They said they have not released the code yet but would let me know when they did. He didn’t specify why, etc.

nswitanek · April 23, 2019, 7:31pm

So here’s a draft todo list:

1. Exploratory Data Analysis (EDA)
[x] Examine data format. For each case, three numpy array files (.npy), where each array comes from a scan along a different plane (axial, coronal, sagittal), and is of shape (s, 256, 256), corresponding to s images, or slices, each of dimension 256x256. The number of slices per plane differs from patient to patient.
[x] Visually examine examples.
[x] Note the probable scan techniques used for the data. Axial, coronal, and sagittal images use distinct methods.
[x] Tabulate occurrence of various categories (normal, abnormal, ACL tear, Meniscus tear), noting class imbalances.
[ ] other EDA steps…

2. Prep data for modeling
[x] Convert files to images files for toy model.
[ ] Create custom ItemBase subclass.
[ ] Create custom ItemList subclass.
[ ] Create labeled DataBunch
[ ] Make sure data splitting for train/valid/test is appropriate (original paper authors kept all three scans per patient in same set.
[ ] Normalize data.
[ ] Explore adding appropriate data augmentation strategies (rotations, shifts, flips, perspective warping, superresolution, etc) possibly dependent on model used.
[ ] Other data prep steps…

3. Fit models to data
[ ] Start with toy model fit on middle slice from one plane, or from all three planes, using pre-trained 2D model.
[ ] Choose and implement loss function (original paper uses cross-entropy loss, re-scaled to account for class imbalances).
[ ] Implement competition’s target performance metric of AUC averaged across three classification tasks (detection of abnormality, of ACL tears, of Meniscal tears).
[ ] Replicate results of original paper using their model and data pre-processing approaches.
[ ] Improve on original paper by using fastai pre-processing and model tuning procedures.
[ ] Explore alternative model architectures for images and image sequences, where corresponding pre-trained model weights can be used.
[ ] Explore incorporating a segmentation step (ANT-GAN or other).
[ ] Explore volumetric, rather than 2d approaches.
[ ] Explore parameter and hyperparameter search to improve expected performance on out of sample data.
[ ] Other model fitting steps…

4. Submit results
[ ] Make official submission.

Lots to do. What do you want to work on?

neuradai · April 24, 2019, 1:24am

For the toy problem, I’ve developed this subclass of ImageList.

class MRImageList(ImageList):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        
    def open(self, fn):
        x = np.load(fn)
        mid_slice = x.shape[0] // 2
        return self.arr2image(np.stack([x[mid_slice]]*3, axis=0))
    
    @staticmethod
    def arr2image(arr:np.ndarray, div:bool=True, cls:type=Image):
        x = Tensor(arr)
        if div == True: x.div_(255)
        return cls(x)

I also reorganized the data into this tree structure to facilitate use of {}_from_df methods in the Data Block API.

..
├── data
│   ├── axial
│   │   ├── train
│   │   └── valid
│   ├── coronal
│   │   ├── train
│   │   └── valid
│   └── sagittal
│       ├── train
│       └── valid
└── mrnet-fastai

It also required merging the df for train and valid into a master.

LessW2020 · April 24, 2019, 1:43am

Nice work guys! I’m knee deep in the activation function research right now as Jeremy asked me to run all of them for 80 epochs on ImageWoof…but I hope to have that done late tonight/ early tomorrow.

The bonus is I hope to contribute an improved activation function for us to use here. GeneralRelu was the winner earlier, but then I added a mean shift to some of the new ones and that changed things. A question from the forum then prompted me to code up a new function (TRelu) and that ended up being the winner of the first runoff. Both FTSwish+ and TRelu beat out General Relu and of course Relu, as well as LiSHT.

TRelu is relu with a threshold value (default = - .25) for all negative values and a mean shift to start.
Anyway I hope to finish all these runs tonight, post out the results and then get going here.

Also, there is another type of pooling function MPConv, that is showing strong performance when coupled with ResNet…in theory, it helps the CNN generalize better. That’s something I’d also like to put on an XResNet and test it out on the MRI images.

Results tomorrow for the activation runoff and talk to you soon!