Cuda error 45: no kernel image - Pytorch 0.3.1.post2

MaheshBhosale · April 5, 2018, 6:33pm

Tried, with a new installation of VS2017, but the error persists.

Chris_Palmer · April 5, 2018, 7:55pm

Perhaps you should try the 0.3.1 version from the peterjc123 site. I see that your installation first uninstalled 0.3.1post2- but presumably it got further than this 0.3.0 version (just crashed with the GPU warning). I am running 0.3.1 from the site, and it works fine. It does give me the warning about my GPU but after that it carries on.

The other thing that differs between our approaches was that I did a conda uninstall pytorch before installing, whereas you evidently didn’t - so perhaps the other modules that pytorch depends on are more compatible with the previous 0.3.1 version you had installed with conda, and going back to it will work for you.

Finally, you are better off with 0.3.1 as it has many bug fixes.

MaheshBhosale · April 6, 2018, 7:02pm

yes, I will do that. Finally, weekend arrived, hope to clear out these errors in time.

MaheshBhosale · April 8, 2018, 11:35am

Hi Chris,

I uninstalled previous 0.3.0 and installed 0.3.1 which is working till importing the modules.
I have installed it with,conda install -c peterjc123 pytorch cuda90

torch.version
‘0.3.1.post2’

But now I am getting further error after warning as,

C:\Users\Mahesh.Bhosale\AppData\Local\Continuum\anaconda3\envs\fastai\lib\site-packages\torch\cuda_init_.py:116: UserWarning:
Found GPU0 Quadro M1200 which is of cuda capability 5.0.
PyTorch no longer supports this GPU because it is too old.

warnings.warn(old_gpu_warn % (d, name, major, capability[1]))

RuntimeError: cuda runtime error (48) : no kernel image is available for execution on the device at c:\anaconda2\conda-bld\pytorch_1519501749874\work\torch\lib\thc\generic/THCTensorMathPointwise.cu:367

I think I should try now with compiling from source, which seems to be last option.

nok · April 8, 2018, 12:08pm

did you tried pip install 0.3.1 wheel already?

Chris_Palmer · April 8, 2018, 12:49pm

That was the wrong move. You are back to where you started from! You need the version supplied by the .whl file, which if it matches your environment will get you past the crashing after the warning of the old GPU. Its the equivalent of installing from source but (theoretically) without the complexity.

Conda uninstall pytorch. Start again and pip install the 0.3.1 version that matches your GPU architecture, Windows 10, and VS 2017.

MaheshBhosale · April 9, 2018, 4:04am

Yes. I tried already. It gives me same DLL error as before while importing the modules.

MaheshBhosale · April 9, 2018, 4:06am

I will give another try, I tried it already with .whl

Chris_Palmer · April 9, 2018, 5:49am

It seems you are really stuck on this. One final check - are you are doing the pip install of the .whl from within the fastai environment?

MaheshBhosale · April 9, 2018, 8:31am

yes within fastai environment.

Chris_Palmer · April 9, 2018, 12:26pm

Hmmm. Good luck with it - I wish I was more technically capable to help you out more!

nok · April 9, 2018, 2:57pm

Just one final stupid question, did you run the notebook directly or you import the library and get the error?
I have seen this DLL problem but it vanish after 0.3.1 PyTorch. Make sure you uninstall any legacy PyTorch, pip install 0.3.1 wheel. Then start a notebook and try import libraries.

I once got the same issue while I have one console hosting the jupyter notebook, another console installing package.It is safer to re-start the notebook or re-start the machine when you in doubt.

MaheshBhosale · April 10, 2018, 6:52pm

Thanks chris for your kind support. I will keep you posted how it goes.

MaheshBhosale · April 10, 2018, 6:57pm

Thanks nok,

I have gone both way, by importing the library and with python interpreter as well as running the notebook. But I am not getting what is your stance behind asking this question, Am I missing something trivial?

Yes I confirmed there are no legacy torch modules I can see, rather we are using the conda environment which should isolate everything outside environment, but I have no other pytorch.

Yes I will take care of restarting the notebook and/or machine whichever is preferable whenever.

nok · April 11, 2018, 2:48am

As I had experience that when I have one notebook running and using another console uninstall and install package. Sometimes the changes are not reflected immediately, it still try to go to the old path for some reason (or maybe I just mess up something)

Conda does not solve all the problem especially if you use pip too, it can confuse conda. Do you have a space in your directory? If possible, try uninstall everything and reinstall anaconda

https://www.google.com.hk/url?sa=t&source=web&rct=j&url=https://github.com/pytorch/pytorch/issues/4518&ved=2ahUKEwiw6JvdmbHaAhVU6LwKHdPLBngQFjAAegQIBRAB&usg=AOvVaw3pcFLhYECNOS7l73o9BeJG

MaheshBhosale · April 11, 2018, 7:55pm

Thanks, @Chris_Palmer and @nok , you have been really helpful to me.

I was able to resolve the issue by downloading the right version of cudaNN and right version of CUDA and including PATH variable to environment variable.

I overlooked some basic information of requirements mentioned in peterjc123 Github repo. It clearly mentions:

For all versions
Windows x64
Python x64 3.5 / 3.6
MKL/Numpy/PyYAML

For GPU versions
CUDA 8
cuDNN 6
NVTX (Visual Studio Integration in CUDA. if it fails to be installed, you can extract
the CUDA installer exe and found the NVTX installer under the CUDAVisualStudioIntegration)

I had CUDA 9.1 and cudaNN 7.1.2, which I moved to CUDA 8 and cudaNN 6. Also, add paths of CUDA and cudaNN to PATH environment variable(which resolved issue of missing DLL), I referred https://github.com/tensorflow/tensorflow/issues/10033. And then installed right .whl from peterjc123 repo https://github.com/peterjc123/pytorch-scripts.

Again, Thanks a lot for your valuable time and interest in solving this issue.

Chris_Palmer · April 11, 2018, 8:49pm

Great you got it sorted!

FYI - I have CUDA 9.0 and CuDNN 7 and they work fine - maybe the peterjc123 requirements need updating - but for sure CUDA 9.1 does not work - so you were correct to back-track on that. CUDA requirements are discussed also on the fast.ai forums, and it seems that since both Pytorch and Tensorflow have moved to CUDA 9.0 then its OK (and recommended) to install CUDA 9.0.

I hope this doesn’t confuse the issue for you!

MaheshBhosale · April 12, 2018, 5:26am

Thanks for confirmation.

Chris_Palmer · April 23, 2018, 12:02pm

Hi @MaheshBhosale and @nok

Thought this might interest you!

nok · April 23, 2018, 1:36pm

Sounds great, but the next release gonna be 0.5.0?