Load dicom to image classifier

Confirmed it’s a bug, we’ll replace it with a case insensitive match (tomorrow).

1 Like

Lots of interesting things raised from a simple question… Here’s another simple question. To use label from list, should I provide an ItemList or a regular list of items as [Items]? I tried with and ItemList and it gives me an error.
Thanks for your help and the commitment with this topic :grinning:

The answer is you shouldn’t use label_from_list. I forgot to make it private but will do today, it’s for internal purpose only.

I read about it in the docs. If I shouldn’t use it, what should I use to assign to a group of imeges there segmentation labels?

The problem with lists is that after a split, your lists won’t match anymore. Create a function that maps the filenames to their label, it’s more reliable.

2 Likes

Hi I’ve manege to fit all my data(2d ct with 2d label) in a data bunch. when i try to train the unet learner i get a broken pipe error.

in
----> 1 learn.fit_one_cycle(5)

~\Anaconda3\envs\fastai\lib\site-packages\fastai\train.py in fit_one_cycle(learn, cyc_len, max_lr, moms, div_factor, pct_start, wd, callbacks, **kwargs)
19 callbacks.append(OneCycleScheduler(learn, max_lr, moms=moms, div_factor=div_factor,
20 pct_start=pct_start, **kwargs))
—> 21 learn.fit(cyc_len, max_lr, wd=wd, callbacks=callbacks)
22
23 def lr_find(learn:Learner, start_lr:Floats=1e-7, end_lr:Floats=10, num_it:int=100, stop_div:bool=True, **kwargs:Any):

~\Anaconda3\envs\fastai\lib\site-packages\fastai\basic_train.py in fit(self, epochs, lr, wd, callbacks)
164 callbacks = [cb(self) for cb in self.callback_fns] + listify(callbacks)
165 fit(epochs, self.model, self.loss_func, opt=self.opt, data=self.data, metrics=self.metrics,
–> 166 callbacks=self.callbacks+callbacks)
167
168 def create_opt(self, lr:Floats, wd:Floats=0.)->None:

~\Anaconda3\envs\fastai\lib\site-packages\fastai\basic_train.py in fit(epochs, model, loss_func, opt, data, callbacks, metrics)
92 except Exception as e:
93 exception = e
—> 94 raise e
95 finally: cb_handler.on_train_end(exception)
96

~\Anaconda3\envs\fastai\lib\site-packages\fastai\basic_train.py in fit(epochs, model, loss_func, opt, data, callbacks, metrics)
80 cb_handler.on_epoch_begin()
81
—> 82 for xb,yb in progress_bar(data.train_dl, parent=pbar):
83 xb, yb = cb_handler.on_batch_begin(xb, yb)
84 loss = loss_batch(model, xb, yb, loss_func, opt, cb_handler)

~\Anaconda3\envs\fastai\lib\site-packages\fastprogress\fastprogress.py in iter(self)
63 self.update(0)
64 try:
—> 65 for i,o in enumerate(self._gen):
66 yield o
67 if self.auto_update: self.update(i+1)

~\Anaconda3\envs\fastai\lib\site-packages\fastai\basic_data.py in iter(self)
68 def iter(self):
69 “Process and returns items from DataLoader.”
—> 70 for b in self.dl:
71 #y = b[1][0] if is_listy(b[1]) else b[1] # XXX: Why is this line here?
72 yield self.proc_batch(b)

~\Anaconda3\envs\fastai\lib\site-packages\torch\utils\data\dataloader.py in iter(self)
817
818 def iter(self):
–> 819 return _DataLoaderIter(self)
820
821 def len(self):

~\Anaconda3\envs\fastai\lib\site-packages\torch\utils\data\dataloader.py in init(self, loader)
558 # before it starts, and del tries to join but will get:
559 # AssertionError: can only join a started process.
–> 560 w.start()
561 self.index_queues.append(index_queue)
562 self.workers.append(w)

~\Anaconda3\envs\fastai\lib\multiprocessing\process.py in start(self)
103 ‘daemonic processes are not allowed to have children’
104 _cleanup()
–> 105 self._popen = self._Popen(self)
106 self._sentinel = self._popen.sentinel
107 # Avoid a refcycle if the target function holds an indirect

~\Anaconda3\envs\fastai\lib\multiprocessing\context.py in _Popen(process_obj)
221 @staticmethod
222 def _Popen(process_obj):
–> 223 return _default_context.get_context().Process._Popen(process_obj)
224
225 class DefaultContext(BaseContext):

~\Anaconda3\envs\fastai\lib\multiprocessing\context.py in _Popen(process_obj)
320 def _Popen(process_obj):
321 from .popen_spawn_win32 import Popen
–> 322 return Popen(process_obj)
323
324 class SpawnContext(BaseContext):

~\Anaconda3\envs\fastai\lib\multiprocessing\popen_spawn_win32.py in init(self, process_obj)
63 try:
64 reduction.dump(prep_data, to_child)
—> 65 reduction.dump(process_obj, to_child)
66 finally:
67 set_spawning_popen(None)

~\Anaconda3\envs\fastai\lib\multiprocessing\reduction.py in dump(obj, file, protocol)
58 def dump(obj, file, protocol=None):
59 ‘’‘Replacement for pickle.dump() using ForkingPickler.’’’
—> 60 ForkingPickler(file, protocol).dump(obj)
61
62 #

BrokenPipeError: [Errno 32] Broken pipe

i can’t find where it’s the problem

Try num_workers=0 in your databunch.

1 Like

@Angel, did you try @pierreguillou’s solution? This is also what is suggested here.

when i set num_workers=0 it says CUDA error: unspecified launch failure
after reinstall cuda i solve that problem but get
“Expected object of scalar type Long but got scalar type Float for argument #2 ‘target’”
i have tried to set the target np. data type to int (since it’s a label it has only 0s and 1s) and it doesn’t work either.

Call .long() on the target tensor where you are loading it (but after converting it to a torch tensor)

1 Like

it worked, model runing!!!
Thaks to all of you for your help.
Now my next task is discover why it’s incredible slow and dont use the gpu all the time but only in spikes .


Does it has something to do with the workers=0 or there is another bug in the infested field of my code?.:man_facepalming:

Hi, me again. I’'m having a problem loading from a csv. It says “Failed to interpret file ‘pancreas ct\Pancreas-CT\.\PANCREAS_0001\000004.dcm’ as a pickle”. i dont understand why it puts that point between the path and the folder of the patient and why it’s having any trouble since i made the csv with dataset.to_csv(fn) function.

You should try to make a custom ImageItemList and modify the labels fuction

class DicomItemList(ImageItemList):  
    def open(self, fn): 
        dicom_img = dcm.dcmread(str(fn)) 
        img = PIL.Image.fromarray(dicom_img.pixel_array).convert('RGB')
        return Image(pil2tensor(img,dtype=np.float32).div_(255))

    def dicom_labels(self,df, **kwargs)->'LabelList':
        """Custom Labels from path"""
        file_names=np.vectorize(lambda files: str(files).split('/')[-1][:-4])
        get_labels=lambda x: df.loc[x,'Target']
        
        labels= get_labels(file_names(self.items))
        y = CategoryList(items=labels)
        res = self._label_list(x=self,y=y)
        return res

def get_data(bs, sz):
    train_ds = (DicomItemList.from_folder(path, extensions='.dcm')
        .random_split_by_pct()
        .dicom_labels(df)
        .transform(tfms=get_transforms(do_flip=False,max_rotate=0.,max_warp=0), size=sz)
        .databunch(num_workers=0,bs=16)
        .normalize())
    return train_ds
2 Likes

Hi, thanks. I forgot to tell you all. I have made a custom to_csv and from_csv. And they work. . I’ll share with you the notebook in a few days with the functions and classes. I have made some functions that manage dicom files that could be useful for many others(if there’s some else interested in CT analysis).
As a funny thing when you see the notebook (if you do it) I solve the problem of fitting a Resnet u net with a single channel image adding 2 more identical channels(argentinian style). I know that I should fit the dinamic u-net with a costum restnet where I modified the input layer,but I will do it later.

1 Like

I was training to find the correct learning rate with lr_find and it outputs a graph with 0 loss for every learning rate. someone has any idea of the possible problem.
I can see a ct scan when i call data.train.x[0] and its label with data.train.y[0] so I’m sure x and y are not the same.

Hi, this extension of ImageItemList is not working for me. I’m interested in it, because it shows how to load input images from dcom files located on disk, while labels masks get from csv.
I have changed the ImageItemList to ImageList, but this is not enough. Can you please help get it working?

Yes, can you take screenshot of the error or show me the error exactly?

Yes, here’s what I got:

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-18-8c07244c1674> in <module>
     31     return train_ds
     32 
---> 33 data = get_data(16, 128)
     34 print(data)

<ipython-input-18-8c07244c1674> in get_data(bs, sz)
     25     train_ds = (DicomItemList.from_folder(dir_data_train, extensions='.dcm')
     26         .random_split_by_pct()
---> 27         .dicom_labels(df_labels)
     28         .transform(tfms=get_transforms(do_flip=False,max_rotate=0.,max_warp=0), size=sz)
     29         .databunch(num_workers=0,bs=16)

/opt/conda/lib/python3.6/site-packages/fastai/data_block.py in _inner(*args, **kwargs)
    475             self.valid = fv(*args, from_item_lists=True, **kwargs)
    476             self.__class__ = LabelLists
--> 477             self.process()
    478             return self
    479         return _inner

/opt/conda/lib/python3.6/site-packages/fastai/data_block.py in process(self)
    529         "Process the inner datasets."
    530         xp,yp = self.get_processors()
--> 531         for ds,n in zip(self.lists, ['train','valid','test']): ds.process(xp, yp, name=n)
    532         #progress_bar clear the outputs so in some case warnings issued during processing disappear.
    533         for ds in self.lists:

/opt/conda/lib/python3.6/site-packages/fastai/data_block.py in process(self, xp, yp, name)
    708                         if len(warnings) > 5: self.warn += "..."
    709                     p.warns = []
--> 710                 self.x,self.y = self.x[~filt],self.y[~filt]
    711         self.x.process(xp)
    712         return self

/opt/conda/lib/python3.6/site-packages/fastai/data_block.py in __getitem__(self, idxs)
    117         idxs = try_int(idxs)
    118         if isinstance(idxs, Integral): return self.get(idxs)
--> 119         else: return self.new(self.items[idxs], inner_df=index_row(self.inner_df, idxs))
    120 
    121     @classmethod

IndexError: boolean index did not match indexed array along dimension 0; dimension is 2135 but corresponding boolean dimension is 2290

Try this way:

def open_dicom(fn):
        dicom_img = dcm.dcmread(str(fn)) 
        img = PIL.Image.fromarray(dicom_img.pixel_array).convert('RGB')
        return Image(pil2tensor(img,dtype=np.float32).div_(255))

class DicomItemList(ImageList):  
    def open(self, fn): return open_dicom(fn)

src= (DicomItemList.from_folder(dir_path, extensions='.dcm')
                 .split_by_rand_pct()
                 .label_from_func(get_labels_function, clasess=classes)

Your get_labels_function has to take one item and return a label. For example, given the files from the images PosixPath('Train_Data/image01.dcm') (this it’s called an item) and a DataFrame with the labels, create a function with
def get_labels_func(item:PosixPath, df:DataFrame): #Your code return label

After that you can use the attrubutes .transform, .databunch and .normalize

Thank you. Yes, this approach works, I already use it.
It’s just that your previous approach was somewhat different, and I was interested in it. Thank you for clarification.