Using fastai for Segmentation, receiving a CUDA device-side assertion error

The accuracy metric no longer works for image segmentation with the latest fastai version. You can try using other metrics like dice or iou.

def dice(pred, targs):
    pred = (pred>0).float()
    return 2. * (pred*targs).sum() / (pred+targs).sum()

def iou(input:Tensor, targs:Tensor) -> Rank0Tensor:
    "IoU coefficient metric for binary target."
    n = targs.shape[0]
    input = input.argmax(dim=1).view(n,-1)
    targs = targs.view(n,-1)
    intersect = (input*targs).sum().float()
    union = (input+targs).sum().float()
    return intersect / (union-intersect+1.0)
3 Likes

thanks! yes i ended up using iou and dice

Hi,
None of this solved the problem for me.
Any suggestions to debug this issue?

1 Like

A CUDA device-side assertion is super generic and just means you have a bad index somewhere. It’s impossible to debug without seeing your code.
One thing that might help is to run the same thing on the CPU first, because you’ll get a clearer message of error.

Could you post how we should instantiate a model from folders “images” and “masks” like the carvana example?
I am trying to do this, but it does not seems to work:

src = (SegmentationItemList.from_folder(path_img)
       .random_split_by_pct(0.2)
       .label_from_func(get_y_fn, classes=[0,1]))

Hi @tcapelle ,
This is what I did:


Where my classes are : [‘background’, ‘person’]

If you like you can join the thread I opened on image segmentation: Image Segmentation on COCO dataset - summary, questions and suggestions

Maybe this can help both of us and also other people :wink:

can you plot the output of src please?

Yes, here it is:

As you can see on the post I linked, I may have some problems on data because the have different shapes (?)

If I disable the GPU, to trace back the error, I get this:

RuntimeError                              Traceback (most recent call last)

<ipython-input-19-f2e08e2ffc17> in <module>()
----> 1 learn.lr_find(); learn.recorder.plot()

/usr/local/lib/python3.6/dist-packages/fastai/train.py in lr_find(learn, start_lr, end_lr, num_it, stop_div, **kwargs)
     30     cb = LRFinder(learn, start_lr, end_lr, num_it, stop_div)
     31     a = int(np.ceil(num_it/len(learn.data.train_dl)))
---> 32     learn.fit(a, start_lr, callbacks=[cb], **kwargs)
     33 
     34 def to_fp16(learn:Learner, loss_scale:float=512., flat_master:bool=False)->Learner:

/usr/local/lib/python3.6/dist-packages/fastai/basic_train.py in fit(self, epochs, lr, wd, callbacks)
    176         callbacks = [cb(self) for cb in self.callback_fns] + listify(callbacks)
    177         fit(epochs, self.model, self.loss_func, opt=self.opt, data=self.data, metrics=self.metrics,
--> 178             callbacks=self.callbacks+callbacks)
    179 
    180     def create_opt(self, lr:Floats, wd:Floats=0.)->None:

/usr/local/lib/python3.6/dist-packages/fastai/utils/mem.py in wrapper(*args, **kwargs)
     83 
     84         try:
---> 85             return func(*args, **kwargs)
     86         except Exception as e:
     87             if "CUDA out of memory" in str(e) or tb_clear_frames=="1":

/usr/local/lib/python3.6/dist-packages/fastai/basic_train.py in fit(epochs, model, loss_func, opt, data, callbacks, metrics)
     98     except Exception as e:
     99         exception = e
--> 100         raise e
    101     finally: cb_handler.on_train_end(exception)
    102 

/usr/local/lib/python3.6/dist-packages/fastai/basic_train.py in fit(epochs, model, loss_func, opt, data, callbacks, metrics)
     88             for xb,yb in progress_bar(data.train_dl, parent=pbar):
     89                 xb, yb = cb_handler.on_batch_begin(xb, yb)
---> 90                 loss = loss_batch(model, xb, yb, loss_func, opt, cb_handler)
     91                 if cb_handler.on_batch_end(loss): break
     92 

/usr/local/lib/python3.6/dist-packages/fastai/basic_train.py in loss_batch(model, xb, yb, loss_func, opt, cb_handler)
     22 
     23     if not loss_func: return to_detach(out), yb[0].detach()
---> 24     loss = loss_func(out, *yb)
     25 
     26     if opt is not None:

/usr/local/lib/python3.6/dist-packages/fastai/layers.py in __call__(self, input, target, **kwargs)
    229         if self.floatify: target = target.float()
    230         input = input.view(-1,input.shape[-1]) if self.is_2d else input.view(-1)
--> 231         return self.func.__call__(input, target.view(-1), **kwargs)
    232 
    233 def CrossEntropyFlat(*args, axis:int=-1, **kwargs):

/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    487             result = self._slow_forward(*input, **kwargs)
    488         else:
--> 489             result = self.forward(*input, **kwargs)
    490         for hook in self._forward_hooks.values():
    491             hook_result = hook(self, input, result)

/usr/local/lib/python3.6/dist-packages/torch/nn/modules/loss.py in forward(self, input, target)
    902     def forward(self, input, target):
    903         return F.cross_entropy(input, target, weight=self.weight,
--> 904                                ignore_index=self.ignore_index, reduction=self.reduction)
    905 
    906 

/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py in cross_entropy(input, target, weight, size_average, ignore_index, reduce, reduction)
   1968     if size_average is not None or reduce is not None:
   1969         reduction = _Reduction.legacy_get_string(size_average, reduce)
-> 1970     return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction)
   1971 
   1972 

/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py in nll_loss(input, target, weight, size_average, ignore_index, reduce, reduction)
   1788                          .format(input.size(0), target.size(0)))
   1789     if dim == 2:
-> 1790         ret = torch._C._nn.nll_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index)
   1791     elif dim == 4:
   1792         ret = torch._C._nn.nll_loss2d(input, target, weight, _Reduction.get_enum(reduction), ignore_index)

RuntimeError: Assertion `cur_target >= 0 && cur_target < n_classes' failed.  at /pytorch/aten/src/THNN/generic/ClassNLLCriterion.c:93

So it is not choosing the right loss as it should be a Binary Cross entropy (predicting beweent 0 and 1)

Try to reduce batch size

You can predict 0 or 1 with CrossEntropy (as long as it can’t be 0 and 1 at the same time, which is the case in segmentation). Your problem is the same as everyone else: your mask is encoded with 0 and 255s, not 0s and 1s, so you need to subclass SegmentationItemList and its open method to return open_mask(bla, div=True).

2 Likes

Thanks, it appears to be that, do you know why open_image does that [0,255] instead of 0,1? My images are saved as 1bit with PIL.
This solved the issue:

class MySegmentationLabelList(SegmentationLabelList):
  def open(self, fn): return open_mask(fn, div=True)

class MySegmentationItemList(ImageItemList):
    "`ItemList` suitable for segmentation tasks."
    _label_cls,_square_show_res = MySegmentationLabelList,False
2 Likes

It depends on how your masks are encoded I guess.

Can you also show how you use those classes to create databunch?

I am trying to solve this problem, any tips on how I could get good segmentation masks?

  src = (MySegmentationItemList.from_folder(path_img)
       .random_split_by_pct(.2)
       .label_from_func(get_y_fn, classes=['background','solar_module']))
  data = (src.transform(get_transforms(), size=size, tfm_y=True)
        .databunch(bs=bs)
        .normalize(imagenet_stats))

[quote=“sgugger, post:3, topic:30292”]
Not that any time you get a CUDA error, you have to rest
[/quote]But I use:mask = open_mask(get_y_fn(img_f), div=True)
mask.show(figsize=(5,5), alpha=1)
, then the mask shows all black. No mask now, right?

My images are .jpg, masks are .png, I tried mask.data[0][50][20:600] to see the data, and found lots of values are 38, the others are 0

If it’s a binary task and you have values of 38 and 0, you must divide by 38 your images or set every value greater than 0 to 1, for example here’s some code tha I use in a similar case:

def open_mk(fn:PathOrStr, div:bool=False, convert_mode:str='L', cls:type=ImageSegment,
        after_open:Callable=None)->Image:
    "Return `Image` object created from image in file `fn`."
    with warnings.catch_warnings():
        warnings.simplefilter("ignore", UserWarning) # EXIF warning from TiffPlugin
        x = PIL.Image.open(fn).convert(convert_mode)
    if after_open: x = after_open(x)
    x = pil2tensor(x,np.float32)
    x[x>0]=1  #MODIFIDED
    if div: x.div_(255)
    return cls(x)

class CustomSegmentationLabelList(SegmentationLabelList):
    def open(self,fn): return open_mk(fn)
    
class CustomSegmentationItemList(ImageList):
    _label_cls= CustomSegmentationLabelList
2 Likes

For all segmentation tasks, make sure that your labels are always starting at 0 and increasing till the number of classes minus 1. For 10 classes your labels should be 0,…,9.

Best is preprocessing the masks and saving the correct format on disk. Best use ImageJ / Fiji for this task, then you can also check that everything fits. Sure you can do it with python, but transforming your data on the fly when loading it for learning is too resource intense, so I would not do that.

@alex_zhang has the segmentation worked for you ?