Can't do Segmentation: CUDA error: device-side assert triggered

(xnet) #1

Could anyone help with troubleshooting the " CUDA error: device-side assert triggered" error?

I’ve figured it’s due to some -1 value in my data, but I don’t know where. I suspect it’s the masks I’ve created, since the camvid datasets work well.

Basically, I manually create masks in numpy and save them as such:
img = Image.fromarray((mask * 255).astype('uint8'), mode='L')
img.save(savefile, bit=1)
where mask is the numpy array

Then, I use this PNG as my segmentation masks. There’s only 2 classes and so only 0 and 1s, and I manually pass in the codes as
codes = np.asarray(['void','seam'])

There’s also some output error on the command line, but I’m not sure how to troubleshoot this:

/opt/conda/conda-bld/pytorch_1549287501208/work/aten/src/THCUNN/ClassNLLCriterion.cu:105: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [15,0,0] Assertion t >= 0 && t < n_classes failed.

I should also add that the masks looks fine when I visualize them with data.show_batch()

1 Like

(Imanol Pinto) #2

I have the same issue, also using PNG files as masks

0 Likes

WideResNet (wrn_22) Cuda Error in Lesson1-pets
(Sunhwan Jo) #3

I think there should not be any gap between your masked integer values. Because you are multiplying 255, you will end up having values 0 and 255 (if your input mask array is 0 and 1 only). Check with np.unique(img.getdata()) and made sure it returns [0, 1].

0 Likes

(Renato Hermoza) #4

When defining the DataBunch, have you specified the classes?

0 Likes

(Imanol Pinto) #5

I have found the solution. The problem is that my masks have the following values: [0, 255], and this is not supported by default by fast.ai. The problem is that the fast.ai function open_mask works with small mask pixel values like 0,1,2,3 by default. This function is called by the SegmentationLabelList class. To work with 255 values we should call the function open_mask with div=True: open_mask(fn, div=True). It divides the mask pixel values by 255.

In order to change the default behaviour of open_mask, I did the following:

class SegLabelListCustom(SegmentationLabelList):
    def open(self, fn): return open_mask(fn, div=True)

class SegItemListCustom(SegmentationItemList):
    _label_cls = SegLabelListCustom

codes = ['0','1']
src = (SegItemListCustom.from_folder(path_img)
       .random_split_by_pct(valid_pct=0.2, seed=33)
       .label_from_func(get_y_fn, classes=codes))

data = (src.transform(get_transforms(), size=size, tfm_y=True)
        .databunch(bs=bs)
        .normalize(imagenet_stats))

Hope this helps. Thank you for your replies!

9 Likes

RuntimeError: CUDA error: device-side assert triggered when using unet_learner
#6

Hi, I used your codes, the error gone, but now by viewing data.show_batch, I found there is no mask. And if try" lr_find(learn);learn.recorder.plot()" after the learner created, the loss is 0, which confirm that the mask is gone…
How to fix that? Thanks

1 Like

#7

I used “get_y_fn = lambda x: pathmask_train/f’{x.stem}_json.png’” before modification

0 Likes

(hasif) #8

Thanks @imanol, it works!

1 Like

(Danielh Carranza) #9

I have the same issue, Did you solve it?

0 Likes

#10

Hey, guys! I’ve just partly fixed the problem(fix the problem using COCO dataset). Now, I can use lr_find(learn) and learn.recorder.plot() without error. Also see this link:Image Segmentation on COCO dataset - summary, questions and suggestions.

I think there are 2 problems in this error: 1)The value of the mask array should be {0,1}, not {0,255}or some other stuff, that’s what many other people said. The link above find ways to generate {0,1} mask. But how to fix it by not transforming data into COCO form, I haven’t tried; 2)The class list. Mine is a bynary segmentation project(to segment “row” ), so CATEGORY_NAMES=[0, ‘row’]. If CATEGORY_NAMES=[‘row’], I found the loss would always be 0, when using lr_find(learn) and learn.recorder.plot().
I think there still are something more to figure out, but for now, at leat I can use unet and lr_find(learn).

0 Likes

#11

I transformed data to COCO, and added 0 to class list(CATEGORY_NAMES=[0, ‘row’], instead[‘row’], e.g.), and it works.
I also updated fastai to 1.0.48, then the data.show_batch error disappeared, though I don’t know why.

0 Likes

(Simone Massaro) #12

It works! thank you

0 Likes

(Divyanshu Sharma) #13

It works!
Should’t it be reported as a bug ?

0 Likes

(Patrick Trampert) #14

Have a look here for those that still have problems:

0 Likes

(Nithin Varghese) #15

Thanks so much, much needed one

0 Likes

#16
class SegLabelListCustom(SegmentationLabelList):
    def open(self, fn): return open_mask(fn, div=True)

class SegItemListCustom(SegmentationItemList):
    _label_cls = SegLabelListCustom

codes = ['0','1']
src = (SegItemListCustom.from_folder(path_img)
       .random_split_by_pct(valid_pct=0.2, seed=33)
       .label_from_func(get_y_fn, classes=codes))

data = (src.transform(get_transforms(), size=size, tfm_y=True)
        .databunch(bs=bs)
        .normalize(imagenet_stats))

Is it normal for this code to take quite a while to run? I’ve had it running for well over an hour with just 5000 128x128 images in the dataset. Hoping it will work but I’m not sure what’s going on. I only changed the paths.

0 Likes