RuntimeError: CUDA error: device-side assert triggered when using unet_learner

Please have a look at the following notebook.

https://colab.research.google.com/drive/1XQi38tzon0Pvrd1UCluXcdbqk1H0JmC4

We’re getting a RuntimeError: CUDA error: device-side assert triggered when using unet_learner when invoking the unet_learner and can’t figure out why.

A Stackoverflow answer says that:

In general, when encountering cuda runtine error s, it is advisable to run your program again using the CUDA_LAUNCH_BLOCKING=1 flag to obtain an accurate stack trace

Is there a way to obtain this from Google Colab.

Also:

the targets of your data were too high (or low) for the specified number of classes.

Which doesn’t seem applicable.

Edit: just found this answer and trying it.

3 Likes

I’ve encountered errors like “device-side assert triggered” a couple times in recent days. And always solution was to calculate desired cell on cpu rather than gpu (.cpu() for ex.). That didn’t get error away but it maked it’s text more clear (it showed the accual error, that went wrong). In fact, as I understood, “device-side assert triggered” is just a general message that indicates that some error in gpu calculations occurred, but it doesn’t tell me which one.
Hope that will help you to move toward solution.

I ran into a similar error and it was because I had label values in my mask that were greater than the number of classes I supplied to the learner. Have a look and see if there is a mismatch between the class label values of your mask, and the number of classes you are supplying.

8 Likes

I ran into the same problem, I have a single class. My labeled data is a .jpg file having a green color as a mask for that object and the rest of the area is in white color. I tried putting codes as object and background label, then same error came. If put only the object name in code. It throws

~/.conda/envs/fastai/lib/python3.7/site-packages/fastai/vision/data.py in process(self, ds)
370 "PreProcessor that stores the classes for segmentation."
371 def init(self, ds:ItemList): self.classes = ds.classes
–> 372 def process(self, ds:ItemList): ds.classes,ds.c = self.classes,len(self.classes)
373
374 class SegmentationLabelList(ImageList):

TypeError: len() of unsized object

Suggest me something to rid of this error.

2 Likes

Got solved !! :partying_face:

1 Like

Hi,

Could anyone please take a look at my notebook and suggest on how to resolve this error?
It seems to be memory related.

Mask size == Image Size
Mask Pixel Level == No of class

3 Likes

Hi @SiddharthGadekar, were you able to solve this error? I can’t see nothing wrong with your notebook.

Could you give some more details? I’m in the same situation.

Were you trying to work with the Airbus Ship Detection dataset?

Why don’t you explain how you did it to help others?

Sorry for late reply. Please follow this repository for error free fastai dynamic u-net implementation for binary classification.

Please go through my notebook. It might help you.

Sorry for the late reply bro. Please go through my implementation.

And please make your mask properly.
No of pixel-level (0-255) == No of classes in RGB image

1 Like

Thank you.

2 Likes

Thanks, I faced the same issue. This solved it.

1 Like

Hi,

I’ve encountered the same error - “CUDA assert error”. Following other thread suggestions, I ran the same model on cpu and got possibly a more clarifying error

Assertion `cur_target >= 0 && cur_target < n_classes' failed.  at /pytorch/aten/src/THNN/generic/ClassNLLCriterion.c:97

This seems to suggest, that my class list is smaller than mask pixelation level. But my mask only has 2 pixels - 0 and 255.

and classes are provided classes = ["card","background"]

here is an example of my mask:

I can’t seem to find the issue, as I generated the masks myself from polyline data, and specifically made them only with 2 pixel values.

Just to double check, running

img_arr = cv2.imread(str(get_y_fn(img_f)))
np.unique(img_arr)

also returns only 2 values array([ 0, 255], dtype=uint8)

After writing this image can read image and check what is the values at the edges. Or you can simply put 0 and 1 for 0 and 255.

Thanks for the answer.

I found the solution in stackoverflow

Since I saved the image to be black and white to visually see the mask, I needed to normalize it when adding to the data. Using the solution in that thread, made everything work for me

I am new to Fastai and deep Learning. I am trying image segmentation on 2 class Data. But i am having error as soon as i start training model!

About data being used - "The annotation images for segmentation task are binary images in which pixels are either 1 for the foreground or 0 for the background. The annotation images named as “xxx.png”. Where “xxx” presents patient ID (from 001 to 3644). "

path_img = '/content/00000TNSCUI2020_train/TNSCUI2020_train/image'
path_lbl = '/content/00000TNSCUI2020_train/TNSCUI2020_train/mask'

fnames = get_image_files(path_img)
fnames[:3]
lbl_names = get_image_files(path_lbl)
lbl_names[:3]
img_f = fnames[0]
img = open_image(img_f)
img.show(figsize=(5,5))

codes = ['others ,thyroid']

from pathlib import Path
import os
def get_y_fn(filename):
   b = Path(path_lbl).joinpath(Path(filename).name)
   return b

mask = open_mask(get_y_fn(img_f))
mask.show(figsize=(5,5), alpha=1)

src_size = np.array(mask.shape[1:])
src_size,mask.data

size = src_size//2
bs = 4
src = (SegmentationItemList.from_folder(path_img)
       .split_by_fname_file('/content/valid.txt')
       .label_from_func(get_y_fn, classes=codes))

data = (src.transform(get_transforms(), size=size, tfm_y=True)
        .databunch(bs=bs)
        .normalize(imagenet_stats))

metrics=accuracy
wd=1e-2
learn = unet_learner(data, models.resnet34, metrics=metrics, wd=wd)

lr=3e-3
learn.fit_one_cycle(10, slice(lr), pct_start=0.9)

RuntimeError: CUDA error: device-side assert triggered

Entire error was-
RuntimeError Traceback (most recent call last)

<ipython-input-24-c2c63f494cee> in <module>()
     44 learn = unet_learner(data, models.resnet34, metrics=metrics, wd=wd)
     45 lr=3e-3
---> 46 learn.fit_one_cycle(10, slice(lr), pct_start=0.9)

4 frames

/usr/local/lib/python3.6/dist-packages/fastai/train.py in fit_one_cycle(learn, cyc_len, max_lr, moms, div_factor, pct_start, final_div, wd, callbacks, tot_epochs, start_epoch)
     21     callbacks.append(OneCycleScheduler(learn, max_lr, moms=moms, div_factor=div_factor, pct_start=pct_start,
     22                                        final_div=final_div, tot_epochs=tot_epochs, start_epoch=start_epoch))
---> 23     learn.fit(cyc_len, max_lr, wd=wd, callbacks=callbacks)
     24 
     25 def fit_fc(learn:Learner, tot_epochs:int=1, lr:float=defaults.lr,  moms:Tuple[float,float]=(0.95,0.85), start_pct:float=0.72,

/usr/local/lib/python3.6/dist-packages/fastai/basic_train.py in fit(self, epochs, lr, wd, callbacks)
    198         else: self.opt.lr,self.opt.wd = lr,wd
    199         callbacks = [cb(self) for cb in self.callback_fns + listify(defaults.extra_callback_fns)] + listify(callbacks)
--> 200         fit(epochs, self, metrics=self.metrics, callbacks=self.callbacks+callbacks)
    201 
    202     def create_opt(self, lr:Floats, wd:Floats=0.)->None:

/usr/local/lib/python3.6/dist-packages/fastai/basic_train.py in fit(epochs, learn, callbacks, metrics)
     99             for xb,yb in progress_bar(learn.data.train_dl, parent=pbar):
    100                 xb, yb = cb_handler.on_batch_begin(xb, yb)
--> 101                 loss = loss_batch(learn.model, xb, yb, learn.loss_func, learn.opt, cb_handler)
    102                 if cb_handler.on_batch_end(loss): break
    103 

/usr/local/lib/python3.6/dist-packages/fastai/basic_train.py in loss_batch(model, xb, yb, loss_func, opt, cb_handler)
     31 
     32     if opt is not None:
---> 33         loss,skip_bwd = cb_handler.on_backward_begin(loss)
     34         if not skip_bwd:                     loss.backward()
     35         if not cb_handler.on_backward_end(): opt.step()

/usr/local/lib/python3.6/dist-packages/fastai/callback.py in on_backward_begin(self, loss)
    288     def on_backward_begin(self, loss:Tensor)->Tuple[Any,Any]:
    289         "Handle gradient calculation on `loss`."
--> 290         self.smoothener.add_value(loss.float().detach().cpu())
    291         self.state_dict['last_loss'], self.state_dict['smooth_loss'] = loss, self.smoothener.smooth
    292         self('backward_begin', call_mets=False)

RuntimeError: CUDA error: device-side assert triggered

Please help. Thanks in advance.