Fastai v2 has a medical imaging submodule!

ilovescience · October 8, 2019, 9:25pm

See here:

fastai/fastai_dev/blob/master/dev/60_medical_imaging.ipynb

{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#default_exp medical.imaging"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#export \n",
    "from local.torch_basics import *\n",
    "from local.test import *\n",

This file has been truncated. show original

Preliminary usage outlined over here:
https://www.kaggle.com/jhoward/creating-a-metadata-dataframe

Thanks @jeremy for developing this! You have given me another reason to switch to fastai v2!

Also, it seems the submodule is still in its infancy. What are the future plans for the module?

jeremy · October 9, 2019, 8:03pm

The next walk-thru of the module is now available:

https://www.kaggle.com/jhoward/some-dicom-gotchas-to-be-aware-of-fastai

Plans are to support a full range of medical imaging modalities and activities, including microscopy, ultrasound, etc.

I also plan to create a fastai.medical.text module.

We’ve got a number of interesting projects starting at https://wamri.ai, across a wide range of datasets, institutions, and problems - hopefully the fastai.medical modules will be a useful foundation across all of them.

redturtle · October 9, 2019, 9:16pm

Nice work! This is very helpful, and I’m glad that other modalities are also planned.

nikitam · October 15, 2019, 11:53am

Hello, fastai developers. Thank’s a lot for your work, especially for the medical module, which helps me a lot in solving the competitions on the Kaggle platform.
So I have a question, @jeremy, I used your function dcm.hist_scaled_px(bins) from here and my training time increased by 30%. I tried to optimize this function to make it work fast on CUDA, but it didn’t work out. Do you have any ideas on how to speed up the process this preprocessing on CUDA?
Since I don’t have a lot of combing capacity, it would be very helpful. Thank you.

jeremy · October 15, 2019, 2:14pm

I’d suggest doing the hist scaling once and saving the scaled data, if it’s too slow (and it you have the space). Or alternatively, use more CPU workers, and apply it lazily.

Another approach would be to use the numpy implementation instead - you can find it in the fastai.medical.imaging library. pytorch is missing an important piece of functionality (searchsorted) which makes this implementation slower.

nikitam · October 16, 2019, 2:25am

Thank you, where exactly is the realization on the numpy? Unfortunately, I can’t find it

arora_aman · October 16, 2019, 11:38am

I have a multilabel dbunch similar to Planets for RSNA kaggle competition like so:

What would be the best way to train a multi label learner in v2 please?

arora_aman · October 16, 2019, 1:18pm

Never mind
Found a tutorial in notebook 23 tutorial_transfer_learning

jeremy · October 17, 2019, 1:12am

Ah sorry I was mistaken. There’s a array_freqhist_bins for getting the bins. Most of hist_scaled is actually using numpy - you could just copy that code and remove the tensor bits.

arora_aman · October 18, 2019, 11:17pm

Context:
Just regarding medical imaging following Jeremy’s fantastic kernels on kaggle, I was trying to understand the freqhist_bins functions.
From the fantastic kernels on Kaggle by Jeremy, I understand it’s use:

split the range of pixel values into groups, such that each group has around the same number of pixels

Since, images have bimodal or in case of the test image in notebook60, something like:

plt.hist(dcm.pixels.flatten());

Which is no good and contains lot of background pixels.

The normalized test image from the notebook itself looks something like this:

So when I do something like,

bins = dcm.pixels.freqhist_bins(20)
plt.hist(dcm.pixels.flatten(), bins=bins);

We get,

How can we interpret the above histogram? From what I can see, there’s more darker pixels but the whiter pixes donot have y-axis values as high as 14,000 after freqhist_bins?
Could we have guessed/expected this plot just from the first histogram above?
Isn’t freqhist_bins essentially creating linearly spaced bins, scaling that to len(imsd) or from what I understand - len of “sorted image pixels”, ie., number of pixels in image, then how does this give something like a Uniform distribution please?

Finally we get a mapping for this image as below:
plt.plot(bins, torch.linspace(0,1,len(bins)));

I am trying to understand the logic behind freqhist_bins so I can learn more about medical images! xD

Thanks in advance

References:

fastai_dev/dev/60_medical_imaging.ipynb

arora_aman · October 20, 2019, 12:22am

I think I’ve got it. It’s actually pretty simple.

Once we flatten and sort the image pixels, we get a 1-D torch tensor which has values like tensor([-2157., -2157., -2155., ..., 1464., 1520., 1521.])

Next, we just make selection points or positions from this array and store in t like so:

    imsd = self.view(-1).sort()[0]
    t = torch.cat([tensor([0.001]),
                   torch.arange(n_bins).float()/n_bins+(1/2/n_bins),
                   tensor([0.999])])
    t = (len(imsd)*t).long()

Finally we just select the pixel values from these t positions from the image.

Now if an image is something like a bimodal distribution then it should have values like:
(this is an example)

[-200,-199,-199,-199,-198,-198,-198,-198,-198,-197,-196,-20,0,4,10,10,11,11,12,12,14,15]
and if the positions were like [2,4,6,8,10]
Then the bins become [-199, -199, -198, -196,0...]

Thus we can see how this function is designed to split values into groups, such that each group has around the same number of values.

Always plenty to learn from fastai

lprevedello · March 6, 2020, 3:25pm

I’ve been trying to run this very exciting Kaggle Notebook on head CT images for head bleed without much success and was wondering if someone could help.

The link is: https://www.kaggle.com/jhoward/from-prototyping-to-submission-fastai

If I use fast_dev code-base I get stuck early on trying to import all the variables and if I use fastai-v2 code base I get stuck on the Transform Pipeline - DataSource. Any ideas on how to solve the issue?

Thanks!

Luciano

jeremy · March 6, 2020, 5:37pm

This is the correct repo to use now @lprevedello: https://github.com/fastai/fastai2 .

If you paste the stack trace for the error you’re getting, we’d be happy to take a look.

lprevedello · March 6, 2020, 8:01pm

Thank you! Fastai2 seem to have trouble with DataSource and Cuda()

Here it is error message I get:

tfms = [[fn2image], [fn2label,EncodedMultiCategorize(htypes)]]
dsrc = DataSource(fns, tfms, splits=splits)
nrm = Normalize(tensor([0.6]),tensor([0.25]))
aug = aug_transforms(p_lighting=0.)
batch_tfms = [IntToFloatTensor(), nrm, Cuda(), *aug]

NameError Traceback (most recent call last)
in
1 tfms = [[fn2image], [fn2label,EncodedMultiCategorize(htypes)]]
----> 2 dsrc = DataSource(fns, tfms, splits=splits)
3 nrm = Normalize(tensor([0.6]),tensor([0.25]))
4 aug = aug_transforms(p_lighting=0.)
5 batch_tfms = [IntToFloatTensor(), nrm, Cuda(), *aug]

NameError: name ‘DataSource’ is not defined

muellerzr · March 6, 2020, 8:10pm

Those are outdated now you should use Datasets() (instead of DataSource) and Cuda() is no longer a transform. (It’s automatically done if the device is available)

lprevedello · March 6, 2020, 9:07pm

Perfect! Thank you!

It looks like the same is true for databunch, right?

def get_data(bs, sz):
return dsrc.databunch(bs=bs, num_workers=nw, after_item=[ToTensor],
after_batch=batch_tfms+[AffineCoordTfm(size=sz)])

AttributeError: databunch

muellerzr · March 6, 2020, 9:08pm

Yes, it’s dataloaders now

jeremy · March 6, 2020, 9:27pm

If anybody is able to fix my Kaggle kernels in a fork, I can update my notebooks to point at the fixed versions…

lprevedello · March 6, 2020, 10:22pm

That would be phenomenal! That’s what I was trying to do, but I do not have enough knowledge about fastai-v2 code yet.

muellerzr · March 6, 2020, 10:23pm

I’d be happy to help you out with understanding the v2 code and getting it functional I don’t have the time to actually go in and fix it on Kaggle but I’ll help with any D/C’s etc. Feel free to send me a Dm while we sort it out!