Load dicom to image classifier

Hi I’m Angel Brisco, I have a project to develop a u-net for pancreas segmentation. I didn’t found a way to feed the ImageDataClasiffier with dicom files. Specially in the way the files are usually downloaded. I have manage to load my database in a 4d numpy array (patient, number of slides, 512, 512). There is a way to feed this array to a fastai convolutional learner? Or to feed it with a pytorch tensor/dataloader?
I will share my code to load the dicom files in a “fastai way” when it’s completely funcional with the api and readable.
Looking forward for all your answers.

3 Likes

I haven’t tested this out, and there might be a typo, but I think you will just need to do something like this:

import pydicom
import torch
from fastai.vision import *

def open_dicom(fn):
    x = pydicom.dcmread(fn)
    return Image(torch.Tensor(x.pixel_array))

class DicomItemList(ImageItemList):
    def open(self, fn): return open_dicom(fn)

data = (DicomItemList.from_folder('path/to/my/data', extensions=('.dcm'))
        .split_by_folder()
        .label_from_folder()
        .databunch(bs=32))

Replacing the .split_by_folder, label_from_folder as per your use case as described here. Then you will be able to use the learner API as normal, e.g.,

learn = create_cnn(data, models.resnet18, metrics=accuracy)
learn.fit(1)

Again, this solution probably needs some work since I haven’t tested it, but I am fairly sure the best solution will look something like this. Someone can correct me if I’m wrong.

5 Likes

I was reading what you have recommended and this seems the best solution. I will try it.
Thanks

1 Like

Could u explain whats happening here:

class DicomItemList(ImageItemList):
def open(self, fn): return open_dicom(fn)

We are just creating a new ItemList which supports opening DICOM files. This is accomplished by overriding the open method in ImageItemList with an open method that supports DICOM files (e.g., the open_dicom function defined in my previous post).

2 Likes

thanks

Hi I can’t find an example of label from list.
I was able to fit the CTs to an item list and the labels to another list. The label_from_list command use a regular list of items or another ItemList?

Hi Angel,

I am having a similar issue with DICOM. I tried to replace open_image with open_dcm_image as shown below:

def open_dcm_image(fn:PathOrStr, div:bool=True, convert_mode:str='RGB', cls:type=Image)->Image:
    with warnings.catch_warnings():
        warnings.simplefilter("ignore", UserWarning) # EXIF warning from TiffPlugin
        array = pydicom.dcmread(str(fn)).pixel_array
        x = PIL.Image.fromarray(array).convert(convert_mode)
    x = pil2tensor(x,np.float32)
    if div: x.div_(255)
    return cls(x)

because open_image (default function) looks like this:

def open_image(fn:PathOrStr, div:bool=True, convert_mode:str=‘RGB’, cls:type=Image)->Image:
“Return Image object created from image in file fn.”
print(‘Hi there’)
with warnings.catch_warnings():
warnings.simplefilter(“ignore”, UserWarning) # EXIF warning from TiffPlugin
x = PIL.Image.open(fn).convert(convert_mode)
x = pil2tensor(x,np.float32)
if div: x.div_(255)
return cls(x)

Then I used the advice from @jcreinhold for creating the DicomItemList:

 class DicomItemList(ImageItemList):
      def open(self, fn): return open_dcm_image(fn)

But following that, I get the error when trying to load

data = (DicomItemList.from_folder(data_folder, extensions=(’.DCM’))
.split_by_folder()
.label_from_folder()
.databunch(bs=32))

which should be the way to load, I guess, considering my data_folder is the PosixPath to a folder containing my data, with each folder name the label of each DCM image.

My error is as follows:


IndexError Traceback (most recent call last)
in
----> 1 data = (DicomItemList.from_folder(data_folder, extensions=(’.DCM’))
2 .split_by_folder()
3 .label_from_folder()
4 .databunch(bs=32))

/data/svcf/software/anaconda3/envs/fastai/lib/python3.7/site-packages/fastai/data_block.py in split_by_folder(self, train, valid)
175 def split_by_folder(self, train:str=‘train’, valid:str=‘valid’)->‘ItemLists’:
176 “Split the data depending on the folder (train or valid) in which the filenames are.”
–> 177 return self.split_by_idxs(self._get_by_folder(train), self._get_by_folder(valid))
178
179 def random_split_by_pct(self, valid_pct:float=0.2, seed:int=None)->‘ItemLists’:

/data/svcf/software/anaconda3/envs/fastai/lib/python3.7/site-packages/fastai/data_block.py in _get_by_folder(self, name)
171
172 def _get_by_folder(self, name):
–> 173 return [i for i in range_of(self) if self.items[i].parts[self.num_parts]==name]
174
175 def split_by_folder(self, train:str=‘train’, valid:str=‘valid’)->‘ItemLists’:

/data/svcf/software/anaconda3/envs/fastai/lib/python3.7/site-packages/fastai/data_block.py in (.0)
171
172 def _get_by_folder(self, name):
–> 173 return [i for i in range_of(self) if self.items[i].parts[self.num_parts]==name]
174
175 def split_by_folder(self, train:str=‘train’, valid:str=‘valid’)->‘ItemLists’:

IndexError: index 0 is out of bounds for axis 0 with size 0

Any help would be appreciated. I also tried this:

np.random.seed(42)
data = ImageDataBunch.from_folder(data_folder, train=".", valid_pct=0.2,
ds_tfms=get_transforms(), size=224, num_workers=4).normalize(imagenet_stats)

And got the same index error. Any help would be appreciated. Thanks!

What does your data directory look like? Is it like:

data_folder
|---- train
|    |---- healthy
|    |    |- img1.DCM
|    |    |- ...
|
|    |---- disease
|    |    |- img2.DCM
|    |    |- ...
|
|---- valid
|    |---- healthy
|    |    |- imgA.DCM
|    |    |- ...
|
|    |---- disease
|    |    |- imgB.DCM
|    |    |- ...

If you are missing the train and valid directories, then don’t use the split_by_folder option. Use whatever one of the other split options is relevant to you (see here).

My project it’s a segmentation one. So I don’t split by folder, I use the random option. And I will label it by list when I found out how to do it(I’m a surgery resident with little time for this most of the time) . But the solution of @jcreinhold work perfectly for me. I even altered it to load a file.npy instead of a dicom file.

My data directory looks like:

data_folder
|---- T1
| |- img1.DCM
|---- T2
| |- imgA.DCM

where it should randomly be split between training and testing. I can’t seem to figure it out without the indexing error.
Thanks so much for your help.

The function .split_by_folder() is looking for data_folder/train/ and data_folder/valid/ and doesn’t find them which causes the error. Use random_split_by_pct() in the place of split_by_folder().

1 Like

I meant to say that I was using

data = (DicomItemList.from_folder(data_folder, extensions=(’.DCM’))
.label_from_folder()
.no_split()
.databunch(bs=32))

I also tried using random_split_by_pct() and I am still getting the same error.

Thanks again,
Julia

Error for reference:
IndexError Traceback (most recent call last)
in
----> 1 data = (DicomItemList.from_folder(data_folder, extensions=(’.DCM’))
2 .label_from_folder()
3 .no_split()
4 .databunch(bs=32))

/data/svcf/software/anaconda3/envs/fastai/lib/python3.7/site-packages/fastai/data_block.py in label_from_folder(self, **kwargs)
248 def label_from_folder(self, **kwargs)->‘LabelList’:
249 “Give a label to each filename depending on its folder.”
–> 250 return self.label_from_func(func=lambda o: o.parts[-2], **kwargs)
251
252 def label_from_re(self, pat:str, full_path:bool=False, **kwargs)->‘LabelList’:

/data/svcf/software/anaconda3/envs/fastai/lib/python3.7/site-packages/fastai/data_block.py in label_from_func(self, func, **kwargs)
244 def label_from_func(self, func:Callable, **kwargs)->‘LabelList’:
245 “Apply func to every input to get its label.”
–> 246 return self.label_from_list([func(o) for o in self.items], **kwargs)
247
248 def label_from_folder(self, **kwargs)->‘LabelList’:

/data/svcf/software/anaconda3/envs/fastai/lib/python3.7/site-packages/fastai/data_block.py in label_from_list(self, labels, **kwargs)
220 “Label self.items with labels.”
221 labels = array(labels, dtype=object)
–> 222 label_cls = self.get_label_cls(labels, **kwargs)
223 y = label_cls(labels, path=self.path, **kwargs)
224 res = self._label_list(x=self, y=y)

/data/svcf/software/anaconda3/envs/fastai/lib/python3.7/site-packages/fastai/data_block.py in get_label_cls(self, labels, label_cls, sep, **kwargs)
210 if label_cls is not None: return label_cls
211 if self.label_cls is not None: return self.label_cls
–> 212 it = index_row(labels,0)
213 if sep is not None: return MultiCategoryList
214 if isinstance(it, (float, np.float32)): return FloatList

/data/svcf/software/anaconda3/envs/fastai/lib/python3.7/site-packages/fastai/core.py in index_row(a, idxs)
221 if isinstance(res,(pd.DataFrame,pd.Series)): return res.copy()
222 return res
–> 223 return a[idxs]
224
225 def func_args(func)->bool:

IndexError: index 0 is out of bounds for axis 0 with size 0

Order matters. Use no_split() before label_from_folder().

I actually thought it might, and tried that as well, very similar error…


IndexError Traceback (most recent call last)
in
----> 1 data = (DicomItemList.from_folder(data_folder, extensions=(’.DCM’))
2 .no_split()
3 .label_from_folder()
4 .databunch(bs=16))

/data/svcf/software/anaconda3/envs/fastai/lib/python3.7/site-packages/fastai/data_block.py in _inner(*args, **kwargs)
407 assert isinstance(fv, Callable)
408 def _inner(*args, **kwargs):
–> 409 self.train = ft(*args, **kwargs)
410 assert isinstance(self.train, LabelList)
411 kwargs[‘label_cls’] = self.train.y.class

/data/svcf/software/anaconda3/envs/fastai/lib/python3.7/site-packages/fastai/data_block.py in label_from_folder(self, **kwargs)
248 def label_from_folder(self, **kwargs)->‘LabelList’:
249 “Give a label to each filename depending on its folder.”
–> 250 return self.label_from_func(func=lambda o: o.parts[-2], **kwargs)
251
252 def label_from_re(self, pat:str, full_path:bool=False, **kwargs)->‘LabelList’:

/data/svcf/software/anaconda3/envs/fastai/lib/python3.7/site-packages/fastai/data_block.py in label_from_func(self, func, **kwargs)
244 def label_from_func(self, func:Callable, **kwargs)->‘LabelList’:
245 “Apply func to every input to get its label.”
–> 246 return self.label_from_list([func(o) for o in self.items], **kwargs)
247
248 def label_from_folder(self, **kwargs)->‘LabelList’:

/data/svcf/software/anaconda3/envs/fastai/lib/python3.7/site-packages/fastai/data_block.py in label_from_list(self, labels, **kwargs)
220 “Label self.items with labels.”
221 labels = array(labels, dtype=object)
–> 222 label_cls = self.get_label_cls(labels, **kwargs)
223 y = label_cls(labels, path=self.path, **kwargs)
224 res = self._label_list(x=self, y=y)

/data/svcf/software/anaconda3/envs/fastai/lib/python3.7/site-packages/fastai/data_block.py in get_label_cls(self, labels, label_cls, sep, **kwargs)
210 if label_cls is not None: return label_cls
211 if self.label_cls is not None: return self.label_cls
–> 212 it = index_row(labels,0)
213 if sep is not None: return MultiCategoryList
214 if isinstance(it, (float, np.float32)): return FloatList

/data/svcf/software/anaconda3/envs/fastai/lib/python3.7/site-packages/fastai/core.py in index_row(a, idxs)
221 if isinstance(res,(pd.DataFrame,pd.Series)): return res.copy()
222 return res
–> 223 return a[idxs]
224
225 def func_args(func)->bool:

IndexError: index 0 is out of bounds for axis 0 with size 0

Sorry to keep going back and forth - Thank you very much for your help, this is really frustrating!

You should put a pair of bracket around ‘.DCM’, I think it may interpret it as ['.', 'D', 'C', 'M'] otherwise.

1 Like

Nice catch. @julclu, for future reference, if you are trying to cast '.DCM' in a tuple by itself, you would need to write ('.DCM',) instead of ('.DCM'). Hopefully that solves it :+1:

2 Likes

@sgugger @jcreinhold
After hours of toying with this issue, I actually finally solved my problem with a REALLY dumb solution - the extension should not have be capitalized. I think that this is a bug, because my actual dicom’s have capitalized extensions. It should be case-insensitive.

1 Like