Image classification with DICOM images

Thank you @jeremy for creating the great community. I have learned a lot about ML and DL thorough fastai.

Hi everyone,

I am trying to do the image classification task on medical images, for example, normal images vs. images with artifacts. However, most of the medical images are in the DICOM format. I know that pydicom can handle dicom images but integrating pydicom to the fastai library may require some experience.

Does anyone have experience modifying the fastai library so that it would support not only “.jpg” but also “.dcm” images? I really appreciate your comments and suggestions.

Best regards,
Hung

2 Likes

@HungDO https://www.kaggle.com/gzuidhof/full-preprocessing-tutorial may be helpful for understanding and dividing DICOM images.

Someone on Kaggle has recommended https://www.coursera.org/learn/neurohacking

1 Like

Hi @LaPatel,

Thank you so much for your suggestion. Really appreciate that. I will have a look at the link and learn how to manipulate dicom files (*.dcm).

My final goal is to modify dataloader (ImageClassifierData, and its sub-functions) so that it will support *.dcm files. As of my understanding, dataloader only supports *.jpg and *.png files.

Below is the command that I hope to eventually achieve:
data = ImageClassifierData.from_csv(…, suffix=’.dcm’, …)

@HungDO Where you able to modify the Dataloader to support Dicom files? I also have Dicom files I would like to import diirectly instead of converting them to png first.

I made the following change to fastai/dataset.py and then succesfully read dicom images:

@@ -1,7 +1,10 @@
 from PIL.ImageFile import ImageFile
 from .dataloader import DataLoader
 from .transforms import *
-
+try:
+    import pydicom
+except:
+    pass
 
 def get_cv_idxs(n, cv_idx=0, val_pct=0.2, seed=42):
     """ Get a list of index values for Validation set from a dataset
@@ -235,6 +238,13 @@ class BaseDataset(Dataset):
         """True if the data set is used to train regression models."""
         return False
 
+def isdicom(fn):
+  if fn.endswith('.dcm'):
+    return True
+  with open(fn) as fh:
+    fh.seek(80)
+    return fh.read(4)=='DICM'
+
 def open_image(fn):
     """ Opens an image using OpenCV given the file path.
 
@@ -258,6 +268,15 @@ def open_image(fn):
                 req = urllib.urlopen(str(fn))
                 image = np.asarray(bytearray(req.read()), dtype="uint8")
                 im = cv2.imdecode(image, flags).astype(np.float32)/255
+            elif isdicom(fn):
+                slice = pydicom.read_file(fn)
+                if slice.PhotometricInterpretation.startswith('MONOCHROME'):
+                    # Make a fake RGB image
+                    im = np.stack([slice.pixel_array]*3,-1)
+                elif slice.PhotometricInterpretation == 'RGB':
+                    im = slice.pixel_array
+                else:
+                    raise OSError('Unsupported DICOM image with PhotometricInterpretation=={}'.format(slice.PhotometricInterpretation))
             else:
                 im = cv2.imread(str(fn), flags).astype(np.float32)/255
             if im is None: raise OSError(f'File not recognized by opencv: {fn}')

Note that you will have to install pydicom through pip install pydicom to get this to work.

I also haven’t tested anything but a monochrome image with an .dcm extension, so the other code paths might not work.

3 Likes

FYI. Dicom support has now been merged into the fastai master branch, so the above (buggy) patch is no longer needed.

See also the following Kaggle kernel that shows an example of to use dicom images with fastai:

https://www.kaggle.com/dovgro/fastai-exploration

4 Likes

I believe the Dicom support that was merged into the master branch is broken. Lesson1.ipynb now gives the following error when trying to create a learner:


UnicodeDecodeError Traceback (most recent call last)
in ()
1 arch=resnet34
2 data = ImageClassifierData.from_paths(PATH, tfms=tfms_from_model(arch, sz))
----> 3 learn = ConvLearner.pretrained(arch, data, precompute=True)
4 learn.fit(0.01, 2)

/usr/local/lib/python3.6/dist-packages/fastai/conv_learner.py in pretrained(cls, f, data, ps, xtra_fc, xtra_cut, custom_head, precompute, pretrained, **kwargs)
112 models = ConvnetBuilder(f, data.c, data.is_multi, data.is_reg,
113 ps=ps, xtra_fc=xtra_fc, xtra_cut=xtra_cut, custom_head=custom_head, pretrained=pretrained)
–> 114 return cls(data, models, precompute, **kwargs)
115
116 @classmethod

/usr/local/lib/python3.6/dist-packages/fastai/conv_learner.py in init(self, data, models, precompute, **kwargs)
98 if hasattr(data, ‘is_multi’) and not data.is_reg and self.metrics is None:
99 self.metrics = [accuracy_thresh(0.5)] if self.data.is_multi else [accuracy]
–> 100 if precompute: self.save_fc1()
101 self.freeze()
102 self.precompute = precompute

/usr/local/lib/python3.6/dist-packages/fastai/conv_learner.py in save_fc1(self)
177 m=self.models.top_model
178 if len(self.activations[0])!=len(self.data.trn_ds):
–> 179 predict_to_bcolz(m, self.data.fix_dl, act)
180 if len(self.activations[1])!=len(self.data.val_ds):
181 predict_to_bcolz(m, self.data.val_dl, val_act)

/usr/local/lib/python3.6/dist-packages/fastai/model.py in predict_to_bcolz(m, gen, arr, workers)
15 lock=threading.Lock()
16 m.eval()
—> 17 for x,*_ in tqdm(gen):
18 y = to_np(m(VV(x)).data)
19 with lock:

/usr/local/lib/python3.6/dist-packages/tqdm/_tqdm.py in iter(self)
935 “”", fp_write=getattr(self.fp, ‘write’, sys.stderr.write))
936
–> 937 for obj in iterable:
938 yield obj
939 # Update and possibly print the progressbar.

/usr/local/lib/python3.6/dist-packages/fastai/dataloader.py in iter(self)
86 # avoid py3.6 issue where queue is infinite and can result in memory exhaustion
87 for c in chunk_iter(iter(self.batch_sampler), self.num_workers*10):
—> 88 for batch in e.map(self.get_batch, c):
89 yield get_tensor(batch, self.pin_memory, self.half)
90

/usr/lib/python3.6/concurrent/futures/_base.py in result_iterator()
584 # Careful not to keep a reference to the popped future
585 if timeout is None:
–> 586 yield fs.pop().result()
587 else:
588 yield fs.pop().result(end_time - time.time())

/usr/lib/python3.6/concurrent/futures/_base.py in result(self, timeout)
423 raise CancelledError()
424 elif self._state == FINISHED:
–> 425 return self.__get_result()
426
427 self._condition.wait(timeout)

/usr/lib/python3.6/concurrent/futures/_base.py in __get_result(self)
382 def __get_result(self):
383 if self._exception:
–> 384 raise self._exception
385 else:
386 return self._result

/usr/lib/python3.6/concurrent/futures/thread.py in run(self)
54
55 try:
—> 56 result = self.fn(*self.args, **self.kwargs)
57 except BaseException as exc:
58 self.future.set_exception(exc)

/usr/local/lib/python3.6/dist-packages/fastai/dataloader.py in get_batch(self, indices)
73
74 def get_batch(self, indices):
—> 75 res = self.np_collate([self.dataset[i] for i in indices])
76 if self.transpose: res[0] = res[0].T
77 if self.transpose_y: res[1] = res[1].T

/usr/local/lib/python3.6/dist-packages/fastai/dataloader.py in (.0)
73
74 def get_batch(self, indices):
—> 75 res = self.np_collate([self.dataset[i] for i in indices])
76 if self.transpose: res[0] = res[0].T
77 if self.transpose_y: res[1] = res[1].T

/usr/local/lib/python3.6/dist-packages/fastai/dataset.py in getitem(self, idx)
201 xs,ys = zip(*[self.get1item(i) for i in range(*idx.indices(self.n))])
202 return np.stack(xs),ys
–> 203 return self.get1item(idx)
204
205 def len(self): return self.n

/usr/local/lib/python3.6/dist-packages/fastai/dataset.py in get1item(self, idx)
194
195 def get1item(self, idx):
–> 196 x,y = self.get_x(idx),self.get_y(idx)
197 return self.get(self.transform, x, y)
198

/usr/local/lib/python3.6/dist-packages/fastai/dataset.py in get_x(self, i)
297 super().init(transform)
298 def get_sz(self): return self.transform.sz
–> 299 def get_x(self, i): return open_image(os.path.join(self.path, self.fnames[i]))
300 def get_n(self): return len(self.fnames)
301

/usr/local/lib/python3.6/dist-packages/fastai/dataset.py in open_image(fn)
266 elif os.path.isdir(fn) and not str(fn).startswith(“http”):
267 raise OSError(‘Is a directory: {}’.format(fn))
–> 268 elif isdicom(fn):
269 slice = pydicom.read_file(fn)
270 if slice.PhotometricInterpretation.startswith(‘MONOCHROME’):

/usr/local/lib/python3.6/dist-packages/fastai/dataset.py in isdicom(fn)
250 with open(fn) as fh:
251 fh.seek(0x80)
–> 252 return fh.read(4)==‘DICM’
253
254 def open_image(fn):

/usr/lib/python3.6/encodings/ascii.py in decode(self, input, final)
24 class IncrementalDecoder(codecs.IncrementalDecoder):
25 def decode(self, input, final=False):
—> 26 return codecs.ascii_decode(input, self.errors)[0]
27
28 class StreamWriter(Codec,codecs.StreamWriter):

UnicodeDecodeError: ‘ascii’ codec can’t decode byte 0xff in position 30: ordinal not in range(128)

Reverting to cb121994872fbd5f4ee67de01bcb9848a7e54a6b causes lesson1 to start working again.

Sorry about that. I have provided a pull request that should fix the error. Please try again once the pull request has been merged.

This is my solution:

import types

def new_get_x(self, i): 
    fn=os.path.join(self.path, self.fnames[i])
    img=pydicom.dcmread(fn)
    img=img.pixel_array.astype(np.float32)/255
    img=np.stack([img] * 3, axis=2)
    return img


md = ImageClassifierData.from_names_and_array(......)

md.trn_ds.get_x = types.MethodType(new_get_x, md.trn_ds)
md.val_ds.get_x = types.MethodType(new_get_x, md.val_ds)
md.test_ds.get_x = types.MethodType(new_get_x, md.test_ds)
1 Like

Hi, I am a starter and running Lesson1.ipynb. It gives an error below:

UnicodeDecodeError: ‘utf-8’ codec can’t decode byte 0xff in position 30: invalid start byte

Do you know how I can fix this? Thank you.

I believe a fix has been pushed. Do a:
git pull
to get that fix.

no… I have tried git pull but the problem is still there. Then I tried deleting all the files in fastai folder and clone a new one using git, still no change…

problem solved by git pull now. thank you!

Awesome! Was looking for this.

Is this for the v0.7 library only, or were these changes also ported over into the v1.0 library?

4 Likes