Dataloader error with DICOM files

Martin2 · June 19, 2020, 3:28pm

Hi,
I’m trying to build a basic pneumonia classifier based on this dataset with DICOM files: https://www.kaggle.com/c/rsna-pneumonia-detection-challenge.

I tried copying the proces from this fastai2 documentation notebook: https://github.com/fastai/fastai2/blob/master/nbs/61_tutorial.medical_imaging.ipynb

Unfortunately I get the following error when I try to create the dataloader:

UnboundLocalError Traceback (most recent call last)
in ()
----> 1 dls = pneumonia.dataloaders(df.values, bs=bs)

11 frames
/usr/local/lib/python3.6/dist-packages/fastai2/medical/imaging.py in create(cls, fn, mode)
43 if isinstance(fn,bytes): im = Image.fromarray(pydicom.dcmread(pydicom.filebase.DicomBytesIO(fn)).pixel_array)
44 if isinstance(fn,Path): im = Image.fromarray(dcmread(fn).pixel_array)
—> 45 im.load()
46 im = im._new(im.im)
47 return cls(im.convert(mode) if mode else im)

UnboundLocalError: local variable ‘im’ referenced before assignment

My datablock looks like this:

pneumonia = DataBlock(blocks=(ImageBlock(cls=PILDicom), CategoryBlock),

               splitter=RandomSplitter(valid_pct=0.2, seed=42),

               get_x=ColReader(0, pref=pneumonia_img, suff='.dcm'),

               get_y=ColReader(1),

               batch_tfms=aug_transforms(size=size))

And my dataloader like this:

dls = pneumonia.dataloaders(df.values, bs=bs)

I did get the tutorial notebook for medical imaging working without problems, which uses a very similar dateset structure. Did anyone run into a similar problem with this dataset?

vferrer · June 19, 2020, 4:02pm

It seems like a bug.

class PILDicom(PILBase):
    _open_args,_tensor_cls,_show_args = {},TensorDicom,TensorDicom._show_args
    @classmethod
    def create(cls, fn:(Path,str,bytes), mode=None)->None:
        "Open a `DICOM file` from path `fn` or bytes `fn` and load it as a `PIL Image`"
        if isinstance(fn,bytes): im = Image.fromarray(pydicom.dcmread(pydicom.filebase.DicomBytesIO(fn)).pixel_array)
        if isinstance(fn,Path): im = Image.fromarray(dcmread(fn).pixel_array)
        im.load()  # <- Here, if your fn isn't a Path or byte object, it'll crash
        im = im._new(im.im)
        return cls(im.convert(mode) if mode else im)

Try to pass the path to the image and it should work. The path should be an instance of Pathclass

Edit: I’ll report it in github

Martin2 · June 19, 2020, 4:26pm

Hi,

Thanks for the quick reply. This ‘pref=pneumonia_img’ is the Path to my image folder.
Which I defined as:

pneumonia_source = Path(root_dir + ‘Datasets/rsna-pneumonia-detection-challenge/’)
pneumonia_img = pneumonia_source/f"stage_2_train_images/"
pneumonia_img
Path(‘/content/gdrive/My Drive/Datasets/rsna-pneumonia-detection-challenge/stage_2_train_images’)

Or should I pass the path somewhere else?

vferrer · June 20, 2020, 7:03am

In this case, I don’t know why don’t work. In you are using Jupyter Notebook / Colab, you could debug the issue post mortem. You have two options:

Set %pdb on before running the code so when there is an error, pdb kicks in. It’s a global setting so you could run at the beginning of the notebook. Set %pdb offo disable.
Run %debug afterwards the error. It’s like `%pdb on but you control manually which error you want to debug.

Personally, I prefer running %debugafter I get an error because I control in which error I want to debug.

Please, post your findings in here. It may be another bug.

Martin2 · June 21, 2020, 7:51am

Hi,

I couldn’t get more information from the %debug method. But I managed to get it to work! I don’t understand why the ColReader method didn’t work… This how my DataBlock and dataloader looks that did work.

pneumonia = DataBlock(blocks=(ImageBlock(cls=PILDicom), CategoryBlock),
get_x=lambda x:pneumonia_img/f"{x[0]}.dcm",
get_y=lambda x:x[5],
splitter=RandomSplitter(),
batch_tfms=aug_transforms(size=size))

dls = pneumonia.dataloaders(df)

Thanks for the help!

amritv · June 22, 2020, 6:17am

Glad you got his working. I started a blog taking a deeper look at medical imaging with fastai a couple of months ago. You can view the blog site (which uses fastpages :)) here: Medical Imaging, one of the blog pages is particularly looking at notebook_60

I also have a number of functions that I had to update such as with datasets that have more than 1 frame per dicom, changing the photometric representation, cmaps not implemented with show_images, implementing features such as DicomSplit which takes takes into account if the same patient occurs both in the test and validation sets etc. It was originally posted on this thread.

What I do find that works with ColReader is something like this:

blocks = (ImageBlock(cls=PILDicom),
                CategoryBlock)

getters = [
           ColReader('FileID', pref=source/'train/'),
           ColReader('DiaID')
]

and then specify the DataBlock

test = DataBlock(blocks=blocks, 
                   getters = getters,
                   splitter=RandomSplitter(),
                   item_tfms = Resize(256),
                   batch_tfms=tfms,
)

Hope that helps

Martin2 · June 22, 2020, 9:02am

Hi @amritv,

That is an interesting blog, i’m definitely going to read it.

Seesam · July 31, 2020, 2:22pm

Was struggling with loading DICOM as well and only worked with using lambda. However, learn.export() doesn’t support lambda functions so it is not possible to pickle the model.
@Martin2 Did you run into the same problem if you tried to bring the model into production?