Torch - undefined symbol: nvrtcGetProgramLogSize

mcsieber · March 15, 2019, 11:10pm

Update: Resolved by creating a new, “clean” Gradient fastai 1-v3 container and then upgrading to fastai 1.0.49.

Thanks for the responses!

Mark

=========================================================

Has anyone else encountered this problem running a Gradient notebook?:

**import torch**
---------------------
ImportError
...
--> 102 from torch._C import *
...
ImportError: /opt/conda/envs/fastai/lib/python3.6/site-packages/torch/lib/libtorch.so.1: undefined symbol: nvrtcGetProgramLogSize

Environment:

sys.platform   =  linux
fastai.version =  1.0.49
sys.prefix     =  /opt/conda/envs/fastai
sys.version    =  3.6.8 |Anaconda, Inc.| (default, Dec 30 2018, 01:22:34) 
[GCC 7.3.0]
(torch version is 1.0.1)

xeTaiz · March 15, 2019, 11:41pm

What GPU and CUDA version?
It seems this symbol belongs to CUDA.

mcsieber · March 16, 2019, 12:28am

Thanks for the quick response Dominik!

Briefly: I now think this is “user error” - mine - of some sort, and I don’t want to bother the forums with the issue until I have more confidence that there is actually an issue. I’m going to try to get this post closed or demoted or, ideally, deleted.

I do appreciate the links to more information.

Since you asked, and if you are still reading:

Yesterday, the problem occurred yesterday on both Paperspace P6000 GPU (CUDA 9.2, I think), and C2 (non-GPU) VMs.

Two weeks ago, I did not see this behavior in either environment. And I don’t see it on my local machine (non-Nvidia GPU, so I run Torch in CPU mode only)

Today I updated fastai and associated packages, and re-tried. It now seems to work on P4000 GPU machines (tho I’m now seeing other strange import issues), but not on the C2 machines. Hence my reluctance to pursue this until I understand a little more about what is going wrong.

Thanks again.

Mark Sieber

rors101 · August 5, 2019, 6:32am

Hi @mcsieber did you ever get to the bottom of this? I’ve just hit the same error… Im using cloud server with CPU just to get the data set labelled and first loop running on a cheap machine before switching over to GPU. Could this be why?