Image Segmentation: RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED

anrk · February 24, 2021, 8:37pm

I am working on a image segmentation problem. The data is from Deep Global Land Cover Classification dataset. It is a image segmentation problem with Masks of seven classes. I am trying to implement Fastai Datablock API to gather, preprocess and execute a unet architecture. The Dataset and Dataloaders seem to work fine. Even learner.summary returns valid output. However, learner.fit_one_cycle returns following error:

RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED

The link to my colab notebook is as follows.

I would really appreciate any guidance with this problem. Thank you.

stantonius · February 26, 2021, 3:54pm

Hi @anrk I have seen this error before however I cannot open this colab notebook to verify. Can you share with the public please?

anrk · February 26, 2021, 5:29pm

Hi @stantonius, Here is another link. Please let me know if you can view it now.

Thank you for your response.

stantonius · February 26, 2021, 6:49pm

Try running without GPU enabled. This is a good troubleshooting technique as I often find the GPU errors don’t make sense (like in this case).

Doing this you will see the actual error is that your targets are out of bounds (meaning the predictions and target values have different ranges). What I can tell so far is that setting the codes in the MaskBlock(codes=codes) isn’t actually doing anything, so you might look to set the codes differently?

Could you also try one-hot encoding these values and treating this as a multiclass segmentation problem?

anrk · February 26, 2021, 7:18pm

Hi @stantonius,
Thank you for taking alook at the notebook.

I also don’t think it’s a GPU error because I have run other similar segmentation task in same environment successfully. However, In this notebook, when I look at image and mask with dls.one_batch(), the resulting tensors have a device=‘cuda:0’ tag at the end. Do you have any idea why would that be ? I haven’t seen such tag in other problems.

Also, I am curious to know how one can tell if MaskBlock(codes=codes) is working or not ? The dsets.show_at() does show an image with mask which led me to believe that the Datablock API was working.

Do you mean I should one hot encode the codes and pass them to Maskblock()?

Thank you!

stantonius · February 28, 2021, 10:20pm

Did you try running the notebook without the GPU enabled to see what the actual error is?

I think this is normal behaviour when the GPU is enabled and you have created the DataLoaders.

I only guessed the codes weren’t working because when I removed them nothing changed.

When you run the notebook without the GPU enabled, you will see the error suggests that the target value is out of range. This is because the predictions output is a 1D tensor of length 7, but you have not told fastai what each value in that tensor actually represents (ie. there’s no mapping of target pixel value 105 to the codes array you provided). If you’re unsure what the model actually outputs when troubleshooting, you can quickly see by running the following after the learner is created:

with torch.no_grad():
    learn.eval()
    out = learn.model(xb)
out

Take a look at this notebook by Zach Mueller (not tagging him as I dont want to spam :). Although this is a binary segmentation, you can see how he converted the pixel values from the mask to something fastai can make sense of. You can take the same approach for your problem.

Don’t worry about one-hot encoding for now. Upon further research, it is not needed if you get this setup correct (as evidenced by Zach’s notebook)

anrk · March 2, 2021, 3:06pm

@stantonius,

Thank you for digging out the notebook. This gives a solution to the problem.

There is following line in this notebook. Does this mean that fastai expects all the mask points to be consecutive number?
“Now, our mask isn’t set up how fastai expects, in which the mask points are not all in a row. We need to change this:”

Thanks again. I really appreciate your help here.