Using fastai for Segmentation, receiving a CUDA device-side assertion error

(Thomas) #18

If I disable the GPU, to trace back the error, I get this:

RuntimeError                              Traceback (most recent call last)

<ipython-input-19-f2e08e2ffc17> in <module>()
----> 1 learn.lr_find(); learn.recorder.plot()

/usr/local/lib/python3.6/dist-packages/fastai/train.py in lr_find(learn, start_lr, end_lr, num_it, stop_div, **kwargs)
     30     cb = LRFinder(learn, start_lr, end_lr, num_it, stop_div)
     31     a = int(np.ceil(num_it/len(learn.data.train_dl)))
---> 32     learn.fit(a, start_lr, callbacks=[cb], **kwargs)
     33 
     34 def to_fp16(learn:Learner, loss_scale:float=512., flat_master:bool=False)->Learner:

/usr/local/lib/python3.6/dist-packages/fastai/basic_train.py in fit(self, epochs, lr, wd, callbacks)
    176         callbacks = [cb(self) for cb in self.callback_fns] + listify(callbacks)
    177         fit(epochs, self.model, self.loss_func, opt=self.opt, data=self.data, metrics=self.metrics,
--> 178             callbacks=self.callbacks+callbacks)
    179 
    180     def create_opt(self, lr:Floats, wd:Floats=0.)->None:

/usr/local/lib/python3.6/dist-packages/fastai/utils/mem.py in wrapper(*args, **kwargs)
     83 
     84         try:
---> 85             return func(*args, **kwargs)
     86         except Exception as e:
     87             if "CUDA out of memory" in str(e) or tb_clear_frames=="1":

/usr/local/lib/python3.6/dist-packages/fastai/basic_train.py in fit(epochs, model, loss_func, opt, data, callbacks, metrics)
     98     except Exception as e:
     99         exception = e
--> 100         raise e
    101     finally: cb_handler.on_train_end(exception)
    102 

/usr/local/lib/python3.6/dist-packages/fastai/basic_train.py in fit(epochs, model, loss_func, opt, data, callbacks, metrics)
     88             for xb,yb in progress_bar(data.train_dl, parent=pbar):
     89                 xb, yb = cb_handler.on_batch_begin(xb, yb)
---> 90                 loss = loss_batch(model, xb, yb, loss_func, opt, cb_handler)
     91                 if cb_handler.on_batch_end(loss): break
     92 

/usr/local/lib/python3.6/dist-packages/fastai/basic_train.py in loss_batch(model, xb, yb, loss_func, opt, cb_handler)
     22 
     23     if not loss_func: return to_detach(out), yb[0].detach()
---> 24     loss = loss_func(out, *yb)
     25 
     26     if opt is not None:

/usr/local/lib/python3.6/dist-packages/fastai/layers.py in __call__(self, input, target, **kwargs)
    229         if self.floatify: target = target.float()
    230         input = input.view(-1,input.shape[-1]) if self.is_2d else input.view(-1)
--> 231         return self.func.__call__(input, target.view(-1), **kwargs)
    232 
    233 def CrossEntropyFlat(*args, axis:int=-1, **kwargs):

/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    487             result = self._slow_forward(*input, **kwargs)
    488         else:
--> 489             result = self.forward(*input, **kwargs)
    490         for hook in self._forward_hooks.values():
    491             hook_result = hook(self, input, result)

/usr/local/lib/python3.6/dist-packages/torch/nn/modules/loss.py in forward(self, input, target)
    902     def forward(self, input, target):
    903         return F.cross_entropy(input, target, weight=self.weight,
--> 904                                ignore_index=self.ignore_index, reduction=self.reduction)
    905 
    906 

/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py in cross_entropy(input, target, weight, size_average, ignore_index, reduce, reduction)
   1968     if size_average is not None or reduce is not None:
   1969         reduction = _Reduction.legacy_get_string(size_average, reduce)
-> 1970     return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction)
   1971 
   1972 

/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py in nll_loss(input, target, weight, size_average, ignore_index, reduce, reduction)
   1788                          .format(input.size(0), target.size(0)))
   1789     if dim == 2:
-> 1790         ret = torch._C._nn.nll_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index)
   1791     elif dim == 4:
   1792         ret = torch._C._nn.nll_loss2d(input, target, weight, _Reduction.get_enum(reduction), ignore_index)

RuntimeError: Assertion `cur_target >= 0 && cur_target < n_classes' failed.  at /pytorch/aten/src/THNN/generic/ClassNLLCriterion.c:93

So it is not choosing the right loss as it should be a Binary Cross entropy (predicting beweent 0 and 1)

0 Likes

(Pietro La Torre) #19

Try to reduce batch size

0 Likes

#20

You can predict 0 or 1 with CrossEntropy (as long as it can’t be 0 and 1 at the same time, which is the case in segmentation). Your problem is the same as everyone else: your mask is encoded with 0 and 255s, not 0s and 1s, so you need to subclass SegmentationItemList and its open method to return open_mask(bla, div=True).

0 Likes

(Thomas) #21

Thanks, it appears to be that, do you know why open_image does that [0,255] instead of 0,1? My images are saved as 1bit with PIL.
This solved the issue:

class MySegmentationLabelList(SegmentationLabelList):
  def open(self, fn): return open_mask(fn, div=True)

class MySegmentationItemList(ImageItemList):
    "`ItemList` suitable for segmentation tasks."
    _label_cls,_square_show_res = MySegmentationLabelList,False
0 Likes

#22

It depends on how your masks are encoded I guess.

0 Likes

(Pietro La Torre) #23

Can you also show how you use those classes to create databunch?

0 Likes

(Thomas) #24

I am trying to solve this problem, any tips on how I could get good segmentation masks?

  src = (MySegmentationItemList.from_folder(path_img)
       .random_split_by_pct(.2)
       .label_from_func(get_y_fn, classes=['background','solar_module']))
  data = (src.transform(get_transforms(), size=size, tfm_y=True)
        .databunch(bs=bs)
        .normalize(imagenet_stats))
0 Likes

#25

[quote=“sgugger, post:3, topic:30292”]
Not that any time you get a CUDA error, you have to rest
[/quote]But I use:mask = open_mask(get_y_fn(img_f), div=True)
mask.show(figsize=(5,5), alpha=1)
, then the mask shows all black. No mask now, right?

0 Likes

#26

My images are .jpg, masks are .png, I tried mask.data[0][50][20:600] to see the data, and found lots of values are 38, the others are 0

0 Likes

(Danielh Carranza) #27

If it’s a binary task and you have values of 38 and 0, you must divide by 38 your images or set every value greater than 0 to 1, for example here’s some code tha I use in a similar case:

def open_mk(fn:PathOrStr, div:bool=False, convert_mode:str='L', cls:type=ImageSegment,
        after_open:Callable=None)->Image:
    "Return `Image` object created from image in file `fn`."
    with warnings.catch_warnings():
        warnings.simplefilter("ignore", UserWarning) # EXIF warning from TiffPlugin
        x = PIL.Image.open(fn).convert(convert_mode)
    if after_open: x = after_open(x)
    x = pil2tensor(x,np.float32)
    x[x>0]=1  #MODIFIDED
    if div: x.div_(255)
    return cls(x)

class CustomSegmentationLabelList(SegmentationLabelList):
    def open(self,fn): return open_mk(fn)
    
class CustomSegmentationItemList(ImageList):
    _label_cls= CustomSegmentationLabelList
0 Likes

(Patrick Trampert) #28

For all segmentation tasks, make sure that your labels are always starting at 0 and increasing till the number of classes minus 1. For 10 classes your labels should be 0,…,9.

Best is preprocessing the masks and saving the correct format on disk. Best use ImageJ / Fiji for this task, then you can also check that everything fits. Sure you can do it with python, but transforming your data on the fly when loading it for learning is too resource intense, so I would not do that.

0 Likes

(hari rajeev) #29

@alex_zhang has the segmentation worked for you ?

0 Likes

#30

@harikrishnanrajeev Yes, I fixed it with pietro.latorre’s method with updated fastai, and I also added '0’class to the class list, then fit_one_cycle works. Maybe I needn’t generat the mask, but just load from coco json file, I haven’t tried that yet.

1 Like

(hari rajeev) #31

Does anybody know why the mask shows all black ?. Is this as expected ?

0 Likes

(Patrick Trampert) #32

Change the displayed range to 0 … N (where N is the number of classes).

0 Likes

(Pravinvignesh S K) #33

Hey, I am also facing the same problem. I had values of 76 and 0 when i print the mask.data values. Have u found the trick how to solve. If yes please let me know.

0 Likes

(hari rajeev) #34

Please have a look at the approach in this link

this works

1 Like

(Pravinvignesh S K) #35

thanks. it works absolutely fine.

1 Like

(Julian) #36

My brain was almost exploding because I did not find an error in my code but could not train a segmentation model. Just knowing that the accuracy metric no longer works for segmentation saved my life. Thank you.

0 Likes

(hari rajeev) #37

Would be good to use DICE or even DICE + BCE . Thanks .

0 Likes