Can't do Segmentation: CUDA error: device-side assert triggered

(xnet) #1

Could anyone help with troubleshooting the " CUDA error: device-side assert triggered" error?

I’ve figured it’s due to some -1 value in my data, but I don’t know where. I suspect it’s the masks I’ve created, since the camvid datasets work well.

Basically, I manually create masks in numpy and save them as such:
img = Image.fromarray((mask * 255).astype('uint8'), mode='L')
img.save(savefile, bit=1)
where mask is the numpy array

Then, I use this PNG as my segmentation masks. There’s only 2 classes and so only 0 and 1s, and I manually pass in the codes as
codes = np.asarray(['void','seam'])

There’s also some output error on the command line, but I’m not sure how to troubleshoot this:

/opt/conda/conda-bld/pytorch_1549287501208/work/aten/src/THCUNN/ClassNLLCriterion.cu:105: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [15,0,0] Assertion t >= 0 && t < n_classes failed.

I should also add that the masks looks fine when I visualize them with data.show_batch()

1 Like

(Imanol Pinto) #2

I have the same issue, also using PNG files as masks

0 Likes

WideResNet (wrn_22) Cuda Error in Lesson1-pets
(Sunhwan Jo) #3

I think there should not be any gap between your masked integer values. Because you are multiplying 255, you will end up having values 0 and 255 (if your input mask array is 0 and 1 only). Check with np.unique(img.getdata()) and made sure it returns [0, 1].

0 Likes

(Renato Hermoza) #4

When defining the DataBunch, have you specified the classes?

0 Likes

(Imanol Pinto) #5

I have found the solution. The problem is that my masks have the following values: [0, 255], and this is not supported by default by fast.ai. The problem is that the fast.ai function open_mask works with small mask pixel values like 0,1,2,3 by default. This function is called by the SegmentationLabelList class. To work with 255 values we should call the function open_mask with div=True: open_mask(fn, div=True). It divides the mask pixel values by 255.

In order to change the default behaviour of open_mask, I did the following:

class SegLabelListCustom(SegmentationLabelList):
    def open(self, fn): return open_mask(fn, div=True)

class SegItemListCustom(SegmentationItemList):
    _label_cls = SegLabelListCustom

codes = ['0','1']
src = (SegItemListCustom.from_folder(path_img)
       .random_split_by_pct(valid_pct=0.2, seed=33)
       .label_from_func(get_y_fn, classes=codes))

data = (src.transform(get_transforms(), size=size, tfm_y=True)
        .databunch(bs=bs)
        .normalize(imagenet_stats))

Hope this helps. Thank you for your replies!

7 Likes

RuntimeError: CUDA error: device-side assert triggered when using unet_learner
#6

Hi, I used your codes, the error gone, but now by viewing data.show_batch, I found there is no mask. And if try" lr_find(learn);learn.recorder.plot()" after the learner created, the loss is 0, which confirm that the mask is gone…
How to fix that? Thanks

1 Like

#7

I used “get_y_fn = lambda x: pathmask_train/f’{x.stem}_json.png’” before modification

0 Likes

(hasif) #8

Thanks @imanol, it works!

1 Like

(Danielh Carranza) #9

I have the same issue, Did you solve it?

0 Likes

#10

Hey, guys! I’ve just partly fixed the problem(fix the problem using COCO dataset). Now, I can use lr_find(learn) and learn.recorder.plot() without error. Also see this link:Image Segmentation on COCO dataset - summary, questions and suggestions.

I think there are 2 problems in this error: 1)The value of the mask array should be {0,1}, not {0,255}or some other stuff, that’s what many other people said. The link above find ways to generate {0,1} mask. But how to fix it by not transforming data into COCO form, I haven’t tried; 2)The class list. Mine is a bynary segmentation project(to segment “row” ), so CATEGORY_NAMES=[0, ‘row’]. If CATEGORY_NAMES=[‘row’], I found the loss would always be 0, when using lr_find(learn) and learn.recorder.plot().
I think there still are something more to figure out, but for now, at leat I can use unet and lr_find(learn).

0 Likes

#11

I transformed data to COCO, and added 0 to class list(CATEGORY_NAMES=[0, ‘row’], instead[‘row’], e.g.), and it works.
I also updated fastai to 1.0.48, then the data.show_batch error disappeared, though I don’t know why.

0 Likes

(Simone Massaro) #12

It works! thank you

0 Likes

(Divyanshu Sharma) #13

It works!
Should’t it be reported as a bug ?

0 Likes

(Patrick Trampert) #14

Have a look here for those that still have problems:

0 Likes