RuntimeError is cryptic: Unable to debug

I was trying to run a lr_find() for my model, which takes structured data as its input, and it throws this error:

RuntimeError: cuda runtime error (59) : device-side assert triggered at /pytorch/torch/lib/THC/generic/THCTensorCopy.c:20

I have faced this error so many times now that I have lost count. Here are the notebooks for all of the fails:
https://colab.research.google.com/drive/1y0rx10E-ZsfQi0LdsFTOIyoPIUx40ej_#scrollTo=Zg7sfGdZr7xH

Can somebody please help? @radek maybe you can?

Your error message is cryptic indeed, and you won’t get much more help while on the GPU. It usually comes from a bad index somewhere, but to pinpoint the exact location, my advice would be to try on the cpu (.cpu () on any tensor will move them here). There the error message will be clearer and you will see the exact line that is causing the problem.

Note that like many GPU errors, you have to make a restart each time you think you have fixed the bug and rerun the notebook from the beginning.

1 Like

@sgugger’s suggestion worked for me. I switched the device to “cpu”. Restarted and re-ran the whole notebook. The error went away. I then switched back to “cuda” and there was no error. Odd. It might have be a change that I made and later changed back that caused the error.