Can't do Segmentation: CUDA error: device-side assert triggered


(xnet) #1

Could anyone help with troubleshooting the " CUDA error: device-side assert triggered" error?

I’ve figured it’s due to some -1 value in my data, but I don’t know where. I suspect it’s the masks I’ve created, since the camvid datasets work well.

Basically, I manually create masks in numpy and save them as such:
img = Image.fromarray((mask * 255).astype('uint8'), mode='L')
img.save(savefile, bit=1)
where mask is the numpy array

Then, I use this PNG as my segmentation masks. There’s only 2 classes and so only 0 and 1s, and I manually pass in the codes as
codes = np.asarray(['void','seam'])

There’s also some output error on the command line, but I’m not sure how to troubleshoot this:

/opt/conda/conda-bld/pytorch_1549287501208/work/aten/src/THCUNN/ClassNLLCriterion.cu:105: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [15,0,0] Assertion t >= 0 && t < n_classes failed.

I should also add that the masks looks fine when I visualize them with data.show_batch()


(Imanol Pinto) #2

I have the same issue, also using PNG files as masks


(Sunhwan Jo) #3

I think there should not be any gap between your masked integer values. Because you are multiplying 255, you will end up having values 0 and 255 (if your input mask array is 0 and 1 only). Check with np.unique(img.getdata()) and made sure it returns [0, 1].


(Renato Hermoza) #4

When defining the DataBunch, have you specified the classes?


(Imanol Pinto) #5

I have found the solution. The problem is that my masks have the following values: [0, 255], and this is not supported by default by fast.ai. The problem is that the fast.ai function open_mask works with small mask pixel values like 0,1,2,3 by default. This function is called by the SegmentationLabelList class. To work with 255 values we should call the function open_mask with div=True: open_mask(fn, div=True). It divides the mask pixel values by 255.

In order to change the default behaviour of open_mask, I did the following:

class SegLabelListCustom(SegmentationLabelList):
    def open(self, fn): return open_mask(fn, div=True)

class SegItemListCustom(SegmentationItemList):
    _label_cls = SegLabelListCustom

codes = ['0','1']
src = (SegItemListCustom.from_folder(path_img)
       .random_split_by_pct(valid_pct=0.2, seed=33)
       .label_from_func(get_y_fn, classes=codes))

data = (src.transform(get_transforms(), size=size, tfm_y=True)
        .databunch(bs=bs)
        .normalize(imagenet_stats))

Hope this helps. Thank you for your replies!


RuntimeError: CUDA error: device-side assert triggered when using unet_learner
#6

Hi, I used your codes, the error gone, but now by viewing data.show_batch, I found there is no mask. And if try" lr_find(learn);learn.recorder.plot()" after the learner created, the loss is 0, which confirm that the mask is gone…
How to fix that? Thanks


#7

I used “get_y_fn = lambda x: pathmask_train/f’{x.stem}_json.png’” before modification


(hasif) #8

Thanks @imanol, it works!


(Danielh Carranza) #9

I have the same issue, Did you solve it?