cudaGetDevice() failed 01_matmul.ipynb

KevinB · April 29, 2019, 9:05pm

I followed Jeremy’s Instructions here and am running into an issue with my Cuda version.

I’m using dev_swift/01_matmul.ipynb.

Does Swift have a way to show_install()?

So I can tell that Cuda 10.0 is on my machine because when I run nvcc --version, it tells me this:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130

But when I run the following swift code, I get an error:

let zeros = Tensor<Float>(zeros: [1,4,5])
let ones  = Tensor<Float>(ones: [12,4,5])
let twos  = Tensor<Float>(repeating: 2.0, shape: [2,3,4,5])
let range = Tensor<Int32>(rangeFrom: 0, to: 32, stride: 1)

Fatal error: cudaGetDevice() failed. Status: CUDA driver version is insufficient for CUDA runtime version: file /swift-base/swift/stdlib/public/TensorFlow/CompilerRuntime.swift, line 681
Current stack trace:
0    libswiftCore.so                    0x00007fefd366de00 _swift_stdlib_reportFatalErrorInFile + 115
1    libswiftCore.so                    0x00007fefd35b606c <unavailable> + 3035244
2    libswiftCore.so                    0x00007fefd35b615e <unavailable> + 3035486
3    libswiftCore.so                    0x00007fefd33fda12 <unavailable> + 1231378
4    libswiftCore.so                    0x00007fefd3582d42 <unavailable> + 2825538
5    libswiftCore.so                    0x00007fefd33fcef9 <unavailable> + 1228537
6    libswiftTensorFlow.so              0x00007fefd06d6022 <unavailable> + 598050
7    libswiftTensorFlow.so              0x00007fefd06d4770 checkOk(_:file:line:) + 508
8    libswiftTensorFlow.so              0x00007fefd06df940 _ExecutionContext.init() + 1959
9    libswiftTensorFlow.so              0x00007fefd06df710 _ExecutionContext.__allocating_init() + 64
10   libswiftTensorFlow.so              0x00007fefd06df6fc <unavailable> + 636668
11   libpthread.so.0                    0x00007fefe4a63827 <unavailable> + 63527
12   libswiftCore.so                    0x00007fefd3630e80 swift_once + 102
13   libswiftTensorFlow.so              0x00007fefd06d8450 _ExecutionContext.global.unsafeMutableAddressor + 23
14   libswiftTensorFlow.so              0x00007fefd06d65c0 _TFCGetGlobalEagerContext() + 98
15   libswiftTensorFlow.so              0x00007fefd06f28c0 _swift_tfc_GetGlobalEagerContext + 9
Current stack trace:
	frame #7: 0x00007fefe4a63827 libpthread.so.0`__pthread_once_slow(once_control=0x00007fefd0921a90, init_routine=(libstdc++.so.6`std::__once_proxy() at mutex.cc:78:5)) at pthread_once.c:116
	frame #15: 0x00007feffff15259 $__lldb_expr36`main at <REPL>:1

Also, a little further up, this might also be important:

Running this:

%install-location $cwd/swift-install
%install '.package(path: "$cwd/FastaiNotebook_00_load_data")' FastaiNotebook_00_load_data

gives this output:

Installing packages:
	.package(path: "/home/kbird/git/fastai_docs/dev_swift/FastaiNotebook_00_load_data")
		FastaiNotebook_00_load_data
With SwiftPM flags: []
Working in: /tmp/tmpza9ziw9w/swift-install
/home/kbird/swift/usr/bin/swift-build: /home/kbird/anaconda3/envs/fastai/lib/libuuid.so.1: no version information available (required by /home/kbird/swift/usr/lib/swift/linux/libFoundation.so)
/home/kbird/swift/usr/bin/swiftc: /home/kbird/anaconda3/envs/fastai/lib/libuuid.so.1: no version information available (required by /home/kbird/swift/usr/bin/swiftc)
Compile Swift Module 'jupyterInstalledPackages' (1 sources)
/home/kbird/swift/usr/bin/swiftc: /home/kbird/anaconda3/envs/fastai/lib/libuuid.so.1: no version information available (required by /home/kbird/swift/usr/bin/swiftc)

/home/kbird/swift/usr/bin/swift: /home/kbird/anaconda3/envs/fastai/lib/libuuid.so.1: no version information available (required by /home/kbird/swift/usr/bin/swift)

/home/kbird/swift/usr/bin/swift: /home/kbird/anaconda3/envs/fastai/lib/libuuid.so.1: no version information available (required by /home/kbird/swift/usr/bin/swift)

Initializing Swift...
Installation complete!

KevinB · May 3, 2019, 1:00pm

The problem in my case ended up being that my nvidia driver was slightly older. I upgraded to 418.56 and now things are working properly.

I used the .run file directly. The steps for me:

download new version run file with wget: here is the one I used.
wget http://us.download.nvidia.com/XFree86/Linux-x86_64/418.56/NVIDIA-Linux-x86_64-418.56.run
If you have anything besides a 1080 ti, go here to find the correct driver for you: https://www.nvidia.com/Download/Find.aspx
Ctrl+Alt+F1 to switch to a command line.
Kill the X server that is running the graphics: service lightdm stop
enable execute on the .run file: chmod +x NVIDIA-Linux-x86_64-418.56.run
run the .run file with sudo: sudo ./NVIDIA-Linux-x86_64-418.56.run
follow the commands of the install (guessed on a lot of the yes nos)
verify it worked properly: nvidia-smi