Hi I’m Angel Brisco, I have a project to develop a u-net for pancreas segmentation. I didn’t found a way to feed the ImageDataClasiffier with dicom files. Specially in the way the files are usually downloaded. I have manage to load my database in a 4d numpy array (patient, number of slides, 512, 512). There is a way to feed this array to a fastai convolutional learner? Or to feed it with a pytorch tensor/dataloader?
I will share my code to load the dicom files in a “fastai way” when it’s completely funcional with the api and readable.
Looking forward for all your answers.
I haven’t tested this out, and there might be a typo, but I think you will just need to do something like this:
import pydicom
import torch
from fastai.vision import *
def open_dicom(fn):
x = pydicom.dcmread(fn)
return Image(torch.Tensor(x.pixel_array))
class DicomItemList(ImageItemList):
def open(self, fn): return open_dicom(fn)
data = (DicomItemList.from_folder('path/to/my/data', extensions=('.dcm'))
.split_by_folder()
.label_from_folder()
.databunch(bs=32))
Replacing the .split_by_folder
, label_from_folder
as per your use case as described here. Then you will be able to use the learner
API as normal, e.g.,
learn = create_cnn(data, models.resnet18, metrics=accuracy)
learn.fit(1)
Again, this solution probably needs some work since I haven’t tested it, but I am fairly sure the best solution will look something like this. Someone can correct me if I’m wrong.
I was reading what you have recommended and this seems the best solution. I will try it.
Thanks
Could u explain whats happening here:
class DicomItemList(ImageItemList):
def open(self, fn): return open_dicom(fn)
We are just creating a new ItemList which supports opening DICOM files. This is accomplished by overriding the open
method in ImageItemList with an open method that supports DICOM files (e.g., the open_dicom
function defined in my previous post).
thanks
Hi I can’t find an example of label from list.
I was able to fit the CTs to an item list and the labels to another list. The label_from_list command use a regular list of items or another ItemList?
Hi Angel,
I am having a similar issue with DICOM. I tried to replace open_image with open_dcm_image as shown below:
def open_dcm_image(fn:PathOrStr, div:bool=True, convert_mode:str='RGB', cls:type=Image)->Image: with warnings.catch_warnings(): warnings.simplefilter("ignore", UserWarning) # EXIF warning from TiffPlugin array = pydicom.dcmread(str(fn)).pixel_array x = PIL.Image.fromarray(array).convert(convert_mode) x = pil2tensor(x,np.float32) if div: x.div_(255) return cls(x)
because open_image (default function) looks like this:
def open_image(fn:PathOrStr, div:bool=True, convert_mode:str=‘RGB’, cls:type=Image)->Image:
“ReturnImage
object created from image in filefn
.”
print(‘Hi there’)
with warnings.catch_warnings():
warnings.simplefilter(“ignore”, UserWarning) # EXIF warning from TiffPlugin
x = PIL.Image.open(fn).convert(convert_mode)
x = pil2tensor(x,np.float32)
if div: x.div_(255)
return cls(x)
Then I used the advice from @jcreinhold for creating the DicomItemList:
class DicomItemList(ImageItemList): def open(self, fn): return open_dcm_image(fn)
But following that, I get the error when trying to load
data = (DicomItemList.from_folder(data_folder, extensions=(’.DCM’))
.split_by_folder()
.label_from_folder()
.databunch(bs=32))
which should be the way to load, I guess, considering my data_folder is the PosixPath to a folder containing my data, with each folder name the label of each DCM image.
My error is as follows:
IndexError Traceback (most recent call last)
in
----> 1 data = (DicomItemList.from_folder(data_folder, extensions=(’.DCM’))
2 .split_by_folder()
3 .label_from_folder()
4 .databunch(bs=32))/data/svcf/software/anaconda3/envs/fastai/lib/python3.7/site-packages/fastai/data_block.py in split_by_folder(self, train, valid)
175 def split_by_folder(self, train:str=‘train’, valid:str=‘valid’)->‘ItemLists’:
176 “Split the data depending on the folder (train
orvalid
) in which the filenames are.”
–> 177 return self.split_by_idxs(self._get_by_folder(train), self._get_by_folder(valid))
178
179 def random_split_by_pct(self, valid_pct:float=0.2, seed:int=None)->‘ItemLists’:/data/svcf/software/anaconda3/envs/fastai/lib/python3.7/site-packages/fastai/data_block.py in _get_by_folder(self, name)
171
172 def _get_by_folder(self, name):
–> 173 return [i for i in range_of(self) if self.items[i].parts[self.num_parts]==name]
174
175 def split_by_folder(self, train:str=‘train’, valid:str=‘valid’)->‘ItemLists’:/data/svcf/software/anaconda3/envs/fastai/lib/python3.7/site-packages/fastai/data_block.py in (.0)
171
172 def _get_by_folder(self, name):
–> 173 return [i for i in range_of(self) if self.items[i].parts[self.num_parts]==name]
174
175 def split_by_folder(self, train:str=‘train’, valid:str=‘valid’)->‘ItemLists’:IndexError: index 0 is out of bounds for axis 0 with size 0
Any help would be appreciated. I also tried this:
np.random.seed(42)
data = ImageDataBunch.from_folder(data_folder, train=".", valid_pct=0.2,
ds_tfms=get_transforms(), size=224, num_workers=4).normalize(imagenet_stats)
And got the same index error. Any help would be appreciated. Thanks!
What does your data directory look like? Is it like:
data_folder
|---- train
| |---- healthy
| | |- img1.DCM
| | |- ...
|
| |---- disease
| | |- img2.DCM
| | |- ...
|
|---- valid
| |---- healthy
| | |- imgA.DCM
| | |- ...
|
| |---- disease
| | |- imgB.DCM
| | |- ...
If you are missing the train
and valid
directories, then don’t use the split_by_folder
option. Use whatever one of the other split
options is relevant to you (see here).
My project it’s a segmentation one. So I don’t split by folder, I use the random option. And I will label it by list when I found out how to do it(I’m a surgery resident with little time for this most of the time) . But the solution of @jcreinhold work perfectly for me. I even altered it to load a file.npy instead of a dicom file.
My data directory looks like:
data_folder
|---- T1
| |- img1.DCM
|---- T2
| |- imgA.DCM
where it should randomly be split between training and testing. I can’t seem to figure it out without the indexing error.
Thanks so much for your help.
The function .split_by_folder()
is looking for data_folder/train/
and data_folder/valid/
and doesn’t find them which causes the error. Use random_split_by_pct()
in the place of split_by_folder()
.
I meant to say that I was using
data = (DicomItemList.from_folder(data_folder, extensions=(’.DCM’))
.label_from_folder()
.no_split()
.databunch(bs=32))
I also tried using random_split_by_pct() and I am still getting the same error.
Thanks again,
Julia
Error for reference:
IndexError Traceback (most recent call last)
in
----> 1 data = (DicomItemList.from_folder(data_folder, extensions=(’.DCM’))
2 .label_from_folder()
3 .no_split()
4 .databunch(bs=32))
/data/svcf/software/anaconda3/envs/fastai/lib/python3.7/site-packages/fastai/data_block.py in label_from_folder(self, **kwargs)
248 def label_from_folder(self, **kwargs)->‘LabelList’:
249 “Give a label to each filename depending on its folder.”
–> 250 return self.label_from_func(func=lambda o: o.parts[-2], **kwargs)
251
252 def label_from_re(self, pat:str, full_path:bool=False, **kwargs)->‘LabelList’:
/data/svcf/software/anaconda3/envs/fastai/lib/python3.7/site-packages/fastai/data_block.py in label_from_func(self, func, **kwargs)
244 def label_from_func(self, func:Callable, **kwargs)->‘LabelList’:
245 “Apply func
to every input to get its label.”
–> 246 return self.label_from_list([func(o) for o in self.items], **kwargs)
247
248 def label_from_folder(self, **kwargs)->‘LabelList’:
/data/svcf/software/anaconda3/envs/fastai/lib/python3.7/site-packages/fastai/data_block.py in label_from_list(self, labels, **kwargs)
220 “Label self.items
with labels
.”
221 labels = array(labels, dtype=object)
–> 222 label_cls = self.get_label_cls(labels, **kwargs)
223 y = label_cls(labels, path=self.path, **kwargs)
224 res = self._label_list(x=self, y=y)
/data/svcf/software/anaconda3/envs/fastai/lib/python3.7/site-packages/fastai/data_block.py in get_label_cls(self, labels, label_cls, sep, **kwargs)
210 if label_cls is not None: return label_cls
211 if self.label_cls is not None: return self.label_cls
–> 212 it = index_row(labels,0)
213 if sep is not None: return MultiCategoryList
214 if isinstance(it, (float, np.float32)): return FloatList
/data/svcf/software/anaconda3/envs/fastai/lib/python3.7/site-packages/fastai/core.py in index_row(a, idxs)
221 if isinstance(res,(pd.DataFrame,pd.Series)): return res.copy()
222 return res
–> 223 return a[idxs]
224
225 def func_args(func)->bool:
IndexError: index 0 is out of bounds for axis 0 with size 0
Order matters. Use no_split()
before label_from_folder()
.
I actually thought it might, and tried that as well, very similar error…
IndexError Traceback (most recent call last)
in
----> 1 data = (DicomItemList.from_folder(data_folder, extensions=(’.DCM’))
2 .no_split()
3 .label_from_folder()
4 .databunch(bs=16))
/data/svcf/software/anaconda3/envs/fastai/lib/python3.7/site-packages/fastai/data_block.py in _inner(*args, **kwargs)
407 assert isinstance(fv, Callable)
408 def _inner(*args, **kwargs):
–> 409 self.train = ft(*args, **kwargs)
410 assert isinstance(self.train, LabelList)
411 kwargs[‘label_cls’] = self.train.y.class
/data/svcf/software/anaconda3/envs/fastai/lib/python3.7/site-packages/fastai/data_block.py in label_from_folder(self, **kwargs)
248 def label_from_folder(self, **kwargs)->‘LabelList’:
249 “Give a label to each filename depending on its folder.”
–> 250 return self.label_from_func(func=lambda o: o.parts[-2], **kwargs)
251
252 def label_from_re(self, pat:str, full_path:bool=False, **kwargs)->‘LabelList’:
/data/svcf/software/anaconda3/envs/fastai/lib/python3.7/site-packages/fastai/data_block.py in label_from_func(self, func, **kwargs)
244 def label_from_func(self, func:Callable, **kwargs)->‘LabelList’:
245 “Apply func
to every input to get its label.”
–> 246 return self.label_from_list([func(o) for o in self.items], **kwargs)
247
248 def label_from_folder(self, **kwargs)->‘LabelList’:
/data/svcf/software/anaconda3/envs/fastai/lib/python3.7/site-packages/fastai/data_block.py in label_from_list(self, labels, **kwargs)
220 “Label self.items
with labels
.”
221 labels = array(labels, dtype=object)
–> 222 label_cls = self.get_label_cls(labels, **kwargs)
223 y = label_cls(labels, path=self.path, **kwargs)
224 res = self._label_list(x=self, y=y)
/data/svcf/software/anaconda3/envs/fastai/lib/python3.7/site-packages/fastai/data_block.py in get_label_cls(self, labels, label_cls, sep, **kwargs)
210 if label_cls is not None: return label_cls
211 if self.label_cls is not None: return self.label_cls
–> 212 it = index_row(labels,0)
213 if sep is not None: return MultiCategoryList
214 if isinstance(it, (float, np.float32)): return FloatList
/data/svcf/software/anaconda3/envs/fastai/lib/python3.7/site-packages/fastai/core.py in index_row(a, idxs)
221 if isinstance(res,(pd.DataFrame,pd.Series)): return res.copy()
222 return res
–> 223 return a[idxs]
224
225 def func_args(func)->bool:
IndexError: index 0 is out of bounds for axis 0 with size 0
Sorry to keep going back and forth - Thank you very much for your help, this is really frustrating!
You should put a pair of bracket around ‘.DCM’, I think it may interpret it as ['.', 'D', 'C', 'M']
otherwise.
Nice catch. @julclu, for future reference, if you are trying to cast '.DCM'
in a tuple by itself, you would need to write ('.DCM',)
instead of ('.DCM')
. Hopefully that solves it
@sgugger @jcreinhold
After hours of toying with this issue, I actually finally solved my problem with a REALLY dumb solution - the extension should not have be capitalized. I think that this is a bug, because my actual dicom’s have capitalized extensions. It should be case-insensitive.