Lesson 9A: RuntimeError: CUDA error:

I tried to run Jonathan Whitaker’s lesson 9A notebook on Stable Diffusion DeepDive and I encounter this error.

RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

But then I set CUDA_LAUNCH_BLOCKING=1 environment variable and again I have the following error: RuntimeError: CUDA error: an illegal memory access was encountered

I am using a Tesla K80 GPU on GCP. Did anyone else encounter the same error? Any help @johnowhitaker please?

Which line of code was it? I think I remember getting a similar error in Jonathan’s notebook when one of the variables needed to be put on the GPU instead of being on the CPU.

Hi Tanishq,

Thank you very much for your response. The error happens in the long cell (4th code cell) with the full Diffusion loop in the begining of the notebook. It exactly happens when I try to move the image to cpu.

There is a line in the notebook which does that:
image = image.detach().cpu().permute(0, 2, 3, 1).numpy()

In order to debug, I split it into three cells and tried running themL
image = image.detach()
image = image.cpu() [Error happens in this cell]
image = image.permute(0, 2, 3, 1).numpy()

Here is the error message:

Here is the stack trace:

RuntimeError                              Traceback (most recent call last)
/tmp/ipykernel_26668/851303587.py in <module>
----> 1 image = image.cpu()

When I tried running with environment variable CUDA_LAUNCH_BLOCKING=1
Then the error happens at

And the error message is RuntimeError: Unable to find a valid cuDNN algorithm to run convolution


RuntimeError                              Traceback (most recent call last)
/tmp/ipykernel_20699/2034803612.py in <module>
      2 latents = 1 / 0.18215 * latents
      3 with torch.no_grad():
----> 4     image = vae.decode(latents)
      5     image = image.sample

/opt/conda/lib/python3.7/site-packages/diffusers/models/vae.py in decode(self, z, return_dict)
    575     def decode(self, z: torch.FloatTensor, return_dict: bool = True) -> Union[DecoderOutput, torch.FloatTensor]:
    576         z = self.post_quant_conv(z)
--> 577         dec = self.decoder(z)
    578 
    579         if not return_dict:

/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
   1128         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1129                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1130             return forward_call(*input, **kwargs)
   1131         # Do not call functions when jit is used
   1132         full_backward_hooks, non_full_backward_hooks = [], []

/opt/conda/lib/python3.7/site-packages/diffusers/models/vae.py in forward(self, z)
    208     def forward(self, z):
    209         sample = z
--> 210         sample = self.conv_in(sample)
    211 
    212         # middle

/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
   1128         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1129                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1130             return forward_call(*input, **kwargs)
   1131         # Do not call functions when jit is used
   1132         full_backward_hooks, non_full_backward_hooks = [], []

/opt/conda/lib/python3.7/site-packages/torch/nn/modules/conv.py in forward(self, input)
    455 
    456     def forward(self, input: Tensor) -> Tensor:
--> 457         return self._conv_forward(input, self.weight, self.bias)
    458 
    459 class Conv3d(_ConvNd):

/opt/conda/lib/python3.7/site-packages/torch/nn/modules/conv.py in _conv_forward(self, input, weight, bias)
    452                             _pair(0), self.dilation, self.groups)
    453         return F.conv2d(input, weight, bias, self.stride,
--> 454                         self.padding, self.dilation, self.groups)
    455 
    456     def forward(self, input: Tensor) -> Tensor:

RuntimeError: Unable to find a valid cuDNN algorithm to run convolution

Any advice would be extremely helpful @ilovescience Tanishq. Thank you

@VigneshBaskaran sometimes when you get this kind of CUDA error the real problem under the hood is that your GPU is out of memory. Can you check that your GPU still has a descent amount of memory free to use?

Hi @dpoulopoulos ,
Thank you very much for your response. I have been tracking GPU memory using nvidia-smi as well as RAM using htop. I am running on K80 and the batch size is 1. I have more than 4GB free in GPU memory. I still couldn’t figure out a solution :frowning:

That’s a really old GPU - maybe try running on Colab, where you’ll get a more up to date GPU? I’m not sure if this code has been tested on a K80.

Thank you @jeremy. May I request you to please let me know which one of these would you recommend:

  1. NVIDIA T4
  2. NVIDIA V100
  3. NVIDIA Tesla P4
  4. NVIDIA Tesla P100

I shall move to a new GPU. Thank you very much @jeremy

Hi @jeremy . I migrated to V100 and the error doesn’t occur anymore. Thank you very much for your help.

hi Vignesh,

I see you just joined a few days ago, so you might not have picked up on the house rule…
Jeremy is obviously popular around here, so if everyone @mentioned him all the time, he’d never get any work done. So he generally asks not to be @mentioned unless the house is burning down.
Just thank him in text only and he will see it. Or better, to minimse traffic, give thanks by just “liking” a post.

Its was a newbie action, so no foul, no need to apologise, but please read these:

And now, welcome to the community. I hope you have fun and have lots of questions that test my own understanding.

2 Likes

I am sorry Ben. I will read and follow the rules

1 Like