PSA: Ubuntu 16.04 `apt update` breaks CUDA-8.0

Ubuntu 16.04 apt update will install nvidia driver 375.39. However CUDA-8.0 only works against 375.26. If you’re tensorflow starts to hang, you can verify if it’s this issue via:

$ dpkg -l | grep 375
ii  cuda-drivers                          375.26-1                                 amd64        CUDA Driver meta-package
ii  libcuda1-375                          375.26-0ubuntu1                          amd64        NVIDIA CUDA runtime library
ii  libxnvctrl0                           375.26-0ubuntu1                          amd64        NV-CONTROL X extension (runtime library)
ii  nvidia-375                            375.26-0ubuntu1                          amd64        NVIDIA binary driver - version 375.26
ii  nvidia-375-dev                        375.26-0ubuntu1                          amd64        NVIDIA binary Xorg driver development files
ii  nvidia-modprobe                       375.26-0ubuntu1                          amd64        Load the NVIDIA kernel driver and create device files
ii  nvidia-opencl-icd-375                 375.26-0ubuntu1                          amd64        NVIDIA OpenCL ICD
ii  nvidia-settings                       375.26-0ubuntu1                          amd64        Tool for configuring the NVIDIA graphics driver

If the driver versions don’t align, then you’ll have to downgrade the nvidia driver. (Specifically nvidia-modprobe and nvidia-settings will be out-of-sync.) e.g. for nvidia-375:

$ sudo apt install nvidia-375=375.26-0ubuntu1

I got caught by this today.

6 Likes

I had this problem as well, I installed latest cuda and it seemed to fix itself.

Had the same issue and couldn’t get the simple reinstall to work. I had to purge the nvidia and cuda files and reinstall the latest cuda toolbox :frowning: What a pain.

On a related note, what do we think of Intel’s new graphics firmware update?
https://01.org/linuxgraphics/downloads/intel-graphics-update-tool-linux-os-v2.0.4

Also, when reinstalling the Cuda 8.0 Toolbox I found better success downloading the .run file versus the .deb.

Having this issue myself. When you say ‘cuda toolbox’, is it: https://developer.nvidia.com/cuda-toolkit

Here’s the steps I took to fix this:

sudo -s
sudo apt-get purge nvidia*
sudo apt-get autoremove
cd ~/downloads/
wget https://developer.nvidia.com/compute/cuda/8.0/Prod2/local_installers/cuda_8.0.61_375.26_linux-run
killall -9 jupyter-notebook
sh cuda_8.0.61_375.26_linux-run
exit
nvidia-smi
1 Like

I fixed it by ssh’ing into my box and following Jeremy’s instructions, but I forgot to kill the X server, so the device driver installation failed. To get around that, I followed these directions:

  1. Kill your current X server session by typing sudo service lightdm stop or sudo stop lightdm
  2. Enter runlevel 3 (or 5) by typing sudo init 3 (or sudo init 5) and install your .run file.

You might be required to reboot when the installation finishes. If not, run sudo service start lightdm or sudo start lightdm to start your X server again.

2 Likes