Fastai v2 has a medical imaging submodule!

See here:

Preliminary usage outlined over here:
https://www.kaggle.com/jhoward/creating-a-metadata-dataframe

Thanks @jeremy for developing this! You have given me another reason to switch to fastai v2!

Also, it seems the submodule is still in its infancy. What are the future plans for the module?

10 Likes

The next walk-thru of the module is now available:

https://www.kaggle.com/jhoward/some-dicom-gotchas-to-be-aware-of-fastai

Plans are to support a full range of medical imaging modalities and activities, including microscopy, ultrasound, etc.

I also plan to create a fastai.medical.text module.

Weā€™ve got a number of interesting projects starting at https://wamri.ai, across a wide range of datasets, institutions, and problems - hopefully the fastai.medical modules will be a useful foundation across all of them.

19 Likes

Nice work! This is very helpful, and Iā€™m glad that other modalities are also planned.

Hello, fastai developers. Thankā€™s a lot for your work, especially for the medical module, which helps me a lot in solving the competitions on the Kaggle platform.
So I have a question, @jeremy, I used your function dcm.hist_scaled_px(bins) from here and my training time increased by 30%. I tried to optimize this function to make it work fast on CUDA, but it didnā€™t work out. Do you have any ideas on how to speed up the process this preprocessing on CUDA?
Since I donā€™t have a lot of combing capacity, it would be very helpful. Thank you.

1 Like

Iā€™d suggest doing the hist scaling once and saving the scaled data, if itā€™s too slow (and it you have the space). Or alternatively, use more CPU workers, and apply it lazily.

Another approach would be to use the numpy implementation instead - you can find it in the fastai.medical.imaging library. pytorch is missing an important piece of functionality (searchsorted) which makes this implementation slower.

2 Likes

Thank you, where exactly is the realization on the numpy? Unfortunately, I canā€™t find it

I have a multilabel dbunch similar to Planets for RSNA kaggle competition like so:

What would be the best way to train a multi label learner in v2 please?

Never mind :slight_smile:
Found a tutorial in notebook 23 tutorial_transfer_learning

3 Likes

Ah sorry I was mistaken. Thereā€™s a array_freqhist_bins for getting the bins. Most of hist_scaled is actually using numpy - you could just copy that code and remove the tensor bits.

2 Likes

Context:
Just regarding medical imaging following Jeremyā€™s fantastic kernels on kaggle, I was trying to understand the freqhist_bins functions.
From the fantastic kernels on Kaggle by Jeremy, I understand itā€™s use:

split the range of pixel values into groups, such that each group has around the same number of pixels

Since, images have bimodal or in case of the test image in notebook60, something like:

plt.hist(dcm.pixels.flatten());

image

Which is no good and contains lot of background pixels.

The normalized test image from the notebook itself looks something like this:

image


So when I do something like,

bins = dcm.pixels.freqhist_bins(20)
plt.hist(dcm.pixels.flatten(), bins=bins);

We get,
image

  1. How can we interpret the above histogram? From what I can see, thereā€™s more darker pixels but the whiter pixes donot have y-axis values as high as 14,000 after freqhist_bins?
    Could we have guessed/expected this plot just from the first histogram above?

  2. Isnā€™t freqhist_bins essentially creating linearly spaced bins, scaling that to len(imsd) or from what I understand - len of ā€œsorted image pixelsā€, ie., number of pixels in image, then how does this give something like a Uniform distribution please?

Finally we get a mapping for this image as below:
plt.plot(bins, torch.linspace(0,1,len(bins)));
image

I am trying to understand the logic behind freqhist_bins so I can learn more about medical images! xD

Thanks in advance :slight_smile:

References:
https://www.kaggle.com/jhoward/don-t-see-like-a-radiologist-fastai
fastai_dev/dev/60_medical_imaging.ipynb

1 Like

I think Iā€™ve got it. Itā€™s actually pretty simple.

Once we flatten and sort the image pixels, we get a 1-D torch tensor which has values like tensor([-2157., -2157., -2155., ..., 1464., 1520., 1521.])

Next, we just make selection points or positions from this array and store in t like so:

    imsd = self.view(-1).sort()[0]
    t = torch.cat([tensor([0.001]),
                   torch.arange(n_bins).float()/n_bins+(1/2/n_bins),
                   tensor([0.999])])
    t = (len(imsd)*t).long()

Finally we just select the pixel values from these t positions from the image.

Now if an image is something like a bimodal distribution then it should have values like:
(this is an example)

[-200,-199,-199,-199,-198,-198,-198,-198,-198,-197,-196,-20,0,4,10,10,11,11,12,12,14,15]
and if the positions were like [2,4,6,8,10]
Then the bins become [-199, -199, -198, -196,0...]

Thus we can see how this function is designed to split values into groups, such that each group has around the same number of values.

Always plenty to learn from fastai :slight_smile:

1 Like

Iā€™ve been trying to run this very exciting Kaggle Notebook on head CT images for head bleed without much success and was wondering if someone could help.

The link is: https://www.kaggle.com/jhoward/from-prototyping-to-submission-fastai

If I use fast_dev code-base I get stuck early on trying to import all the variables and if I use fastai-v2 code base I get stuck on the Transform Pipeline - DataSource. Any ideas on how to solve the issue?

Thanks!

Luciano

1 Like

This is the correct repo to use now @lprevedello: https://github.com/fastai/fastai2 .

If you paste the stack trace for the error youā€™re getting, weā€™d be happy to take a look.

Thank you! Fastai2 seem to have trouble with DataSource and Cuda()

Here it is error message I get:

tfms = [[fn2image], [fn2label,EncodedMultiCategorize(htypes)]]
dsrc = DataSource(fns, tfms, splits=splits)
nrm = Normalize(tensor([0.6]),tensor([0.25]))
aug = aug_transforms(p_lighting=0.)
batch_tfms = [IntToFloatTensor(), nrm, Cuda(), *aug]


NameError Traceback (most recent call last)
in
1 tfms = [[fn2image], [fn2label,EncodedMultiCategorize(htypes)]]
----> 2 dsrc = DataSource(fns, tfms, splits=splits)
3 nrm = Normalize(tensor([0.6]),tensor([0.25]))
4 aug = aug_transforms(p_lighting=0.)
5 batch_tfms = [IntToFloatTensor(), nrm, Cuda(), *aug]

NameError: name ā€˜DataSourceā€™ is not defined

Those are outdated now :slight_smile: you should use Datasets() (instead of DataSource) and Cuda() is no longer a transform. (Itā€™s automatically done if the device is available)

1 Like

Perfect! Thank you!

It looks like the same is true for databunch, right?

def get_data(bs, sz):
return dsrc.databunch(bs=bs, num_workers=nw, after_item=[ToTensor],
after_batch=batch_tfms+[AffineCoordTfm(size=sz)])

AttributeError: databunch

Yes, itā€™s dataloaders now :slight_smile:

If anybody is able to fix my Kaggle kernels in a fork, I can update my notebooks to point at the fixed versionsā€¦

1 Like

That would be phenomenal! Thatā€™s what I was trying to do, but I do not have enough knowledge about fastai-v2 code yet.

2 Likes

Iā€™d be happy to help you out with understanding the v2 code and getting it functional :slight_smile: I donā€™t have the time to actually go in and fix it on Kaggle but Iā€™ll help with any D/Cā€™s etc. Feel free to send me a Dm while we sort it out!

4 Likes