Can't do Segmentation: CUDA error: device-side assert triggered

Could anyone help with troubleshooting the " CUDA error: device-side assert triggered" error?

I’ve figured it’s due to some -1 value in my data, but I don’t know where. I suspect it’s the masks I’ve created, since the camvid datasets work well.

Basically, I manually create masks in numpy and save them as such:
img = Image.fromarray((mask * 255).astype('uint8'), mode='L')
img.save(savefile, bit=1)
where mask is the numpy array

Then, I use this PNG as my segmentation masks. There’s only 2 classes and so only 0 and 1s, and I manually pass in the codes as
codes = np.asarray(['void','seam'])

There’s also some output error on the command line, but I’m not sure how to troubleshoot this:

/opt/conda/conda-bld/pytorch_1549287501208/work/aten/src/THCUNN/ClassNLLCriterion.cu:105: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [15,0,0] Assertion t >= 0 && t < n_classes failed.

I should also add that the masks looks fine when I visualize them with data.show_batch()

2 Likes

I have the same issue, also using PNG files as masks

2 Likes

I think there should not be any gap between your masked integer values. Because you are multiplying 255, you will end up having values 0 and 255 (if your input mask array is 0 and 1 only). Check with np.unique(img.getdata()) and made sure it returns [0, 1].

1 Like

When defining the DataBunch, have you specified the classes?

I have found the solution. The problem is that my masks have the following values: [0, 255], and this is not supported by default by fast.ai. The problem is that the fast.ai function open_mask works with small mask pixel values like 0,1,2,3 by default. This function is called by the SegmentationLabelList class. To work with 255 values we should call the function open_mask with div=True: open_mask(fn, div=True). It divides the mask pixel values by 255.

In order to change the default behaviour of open_mask, I did the following:

class SegLabelListCustom(SegmentationLabelList):
    def open(self, fn): return open_mask(fn, div=True)

class SegItemListCustom(SegmentationItemList):
    _label_cls = SegLabelListCustom

codes = ['0','1']
src = (SegItemListCustom.from_folder(path_img)
       .random_split_by_pct(valid_pct=0.2, seed=33)
       .label_from_func(get_y_fn, classes=codes))

data = (src.transform(get_transforms(), size=size, tfm_y=True)
        .databunch(bs=bs)
        .normalize(imagenet_stats))

Hope this helps. Thank you for your replies!

13 Likes

Hi, I used your codes, the error gone, but now by viewing data.show_batch, I found there is no mask. And if try" lr_find(learn);learn.recorder.plot()" after the learner created, the loss is 0, which confirm that the mask is gone…
How to fix that? Thanks

2 Likes

I used “get_y_fn = lambda x: pathmask_train/f’{x.stem}_json.png’” before modification

Thanks @imanol, it works!

1 Like

I have the same issue, Did you solve it?

1 Like

Hey, guys! I’ve just partly fixed the problem(fix the problem using COCO dataset). Now, I can use lr_find(learn) and learn.recorder.plot() without error. Also see this link:Image Segmentation on COCO dataset - summary, questions and suggestions.

I think there are 2 problems in this error: 1)The value of the mask array should be {0,1}, not {0,255}or some other stuff, that’s what many other people said. The link above find ways to generate {0,1} mask. But how to fix it by not transforming data into COCO form, I haven’t tried; 2)The class list. Mine is a bynary segmentation project(to segment “row” ), so CATEGORY_NAMES=[0, ‘row’]. If CATEGORY_NAMES=[‘row’], I found the loss would always be 0, when using lr_find(learn) and learn.recorder.plot().
I think there still are something more to figure out, but for now, at leat I can use unet and lr_find(learn).

I transformed data to COCO, and added 0 to class list(CATEGORY_NAMES=[0, ‘row’], instead[‘row’], e.g.), and it works.
I also updated fastai to 1.0.48, then the data.show_batch error disappeared, though I don’t know why.

It works! thank you

It works!
Should’t it be reported as a bug ?

Have a look here for those that still have problems:

Thanks so much, much needed one

class SegLabelListCustom(SegmentationLabelList):
    def open(self, fn): return open_mask(fn, div=True)

class SegItemListCustom(SegmentationItemList):
    _label_cls = SegLabelListCustom

codes = ['0','1']
src = (SegItemListCustom.from_folder(path_img)
       .random_split_by_pct(valid_pct=0.2, seed=33)
       .label_from_func(get_y_fn, classes=codes))

data = (src.transform(get_transforms(), size=size, tfm_y=True)
        .databunch(bs=bs)
        .normalize(imagenet_stats))

Is it normal for this code to take quite a while to run? I’ve had it running for well over an hour with just 5000 128x128 images in the dataset. Hoping it will work but I’m not sure what’s going on. I only changed the paths.

Did u manage to fix the show batch problem ?

Just update fastai, then show batch will be ok

I updated the fastai to 1.0.55 but still no mask is shown

Sorry, I didn’t figure out why it worked, but it worked when I was using kaggle kernel

1 Like