CUDA error: device-side assert triggered

Hello all, I was wondering if anyone was able to solve the following error. I’m still reading about it on the forums but I’m not understanding what the error is. I’ve tried significantly reducing my data size and using different measures for accuracy.

CUDA error: device-side assert triggered

Update: Switched to CPU to get a more accurate stack trace by doing defaults.device = ‘cpu’ . Updated commit.
Update2: Added some of the print statements I’m using to debug what’s wrong. Getting a better idea!
Update3: Met with a mentor and she pointed out that there was no association with codes.txt in the sample notebook. In the original Camvid dataset there are RGB values associated with each class. However, this is not true in the dataset pulled with the fastai notebook. This was confirmed after I cloned and tried to run the sample notebook from Github. It produced the same error I was getting. Going to try setting up those associations again.

1 Like

Have you verified that your labels are properly indexed, as outlined in the error message at the bottom?

Just confirming if you mean the masked images (as those are called labels in the notebook) or the actual labels of the items.

I suspect the former because I tried printing this and it’s not subscriptable:

Edit: Updated with new commit going through debugging.

The former. I suspect your error is in your get_y_fn, but I’m not quite sure how to fix it - I’d need to debug it more closely. I can look further tomorrow.

1 Like

That would be much appreciated!

Met with one of my data science mentors yesterday and she pointed out some very critical things. Talked about in Update3

Have you looked here

Any updates?

Hi,

Can anyone please have a look at my notebook and advise on how to resolve this error?
Im trying to do image segmentation on the ADE20k dataset and using even a small dataset of 450 images throws the cuda : device side assert error

What happens if you execute:

import torch as pt
mask_values = pt.unique(mask.data)

and compare the length of

mask_values

to

codes

?

My assumption is that there is a mismatch between the pixels values and the codes. If that doesn’t help, can you please replace 400 in the following line with src_size:

data = (il.transform(get_transforms(), tfm_y=True,size=400)
        .databunch(bs=2)
        .normalize(imagenet_stats))