I’m following the instructions here: https://course.fast.ai/start_gcp.html
I can set everything up and get access to http://localhost:8080/notebooks/tutorials/fastai/course-v3/nbs/dl1/lesson1-pets.ipynb
.
Problem: nvidia driver fails to install and so I can’t utilize the gpu.
When I ssh into the instance, the following prompt shows up:
This VM requires Nvidia drivers to function correctly. Installation takes ~1 minute.
Would you like to install the Nvidia driver? [y/n]
Upon hitting y
, this is the error:
Would you like to install the Nvidia driver? [y/n] y
Installing Nvidia driver.
Downloading driver from GCS location gs://nvidia-drivers-us-public/tesla/418.87.01/NVIDIA-Linux-x86_64-418.87.01.run
Verifying archive integrity... OK
Uncompressing NVIDIA Accelerated Graphics Driver for Linux-x86_64 418.87.01..............
WARNING: The nvidia-drm module will not be installed. As a result, DRM-KMS will not function with this installation of the NVIDIA driver.
WARNING: nvidia-installer was forced to guess the X library path '/usr/lib' and X module path '/usr/lib/xorg/modules'; these paths were not queryable from the system. If X fails to find the NVIDIA X driver module, please install the
`pkg-config` utility and the X.Org SDK/development package for your distribution and reinstall the driver.
WARNING: Unable to find a suitable destination to install 32-bit compatibility libraries. Your system may not be set up for 32-bit compatibility. 32-bit compatibility files will not be installed; if you wish to install them, re-run the
installation and set a valid directory with the --compat32-libdir option.
ERROR: Error while parsing line 680 of '/var/lib/nvidia/log'.
ERROR: Uninstallation failed.
ERROR: Installation has failed. Please see the file '/var/log/nvidia-installer.log' for details. You may find suggestions on fixing installation problems in the README available on the Linux driver download page at www.nvidia.com.
Nvidia driver installed.
Oddly enough, the last message is Nvidia driver installed
but it clearly hasn’t, since I get the same prompt when I get a new ssh connection.
The error log is:
jupyter@my-fastai-instance:~$ cat /var/log/nvidia-installer.log
nvidia-installer log file '/var/log/nvidia-installer.log'
creation time: Sun Feb 23 08:20:18 2020
installer version: 418.87.01
PATH: /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
nvidia-installer command line:
./nvidia-installer
--dkms
-a
-s
--no-drm
Using built-in stream user interface
-> Detected 8 CPUs online; setting concurrency level to 8.
-> Installing NVIDIA driver version 418.87.01.
-> There appears to already be a driver installed on your system (version: 418.87.01). As part of installing this driver (version: 418.87.01), the existing driver will be uninstalled. Are you sure you want to continue? (Answer: Continue installation)
WARNING: The nvidia-drm module will not be installed. As a result, DRM-KMS will not function with this installation of the NVIDIA driver.
-> Would you like to register the kernel module sources with DKMS? This will allow DKMS to automatically build a new module, if you install a different kernel later. (Answer: Yes)
WARNING: nvidia-installer was forced to guess the X library path '/usr/lib' and X module path '/usr/lib/xorg/modules'; these paths were not queryable from the system. If X fails to find the NVIDIA X driver module, please install the `pkg-config` utility and the X.Org SDK/development package for your distribution and reinstall the driver.
WARNING: Unable to find a suitable destination to install 32-bit compatibility libraries. Your system may not be set up for 32-bit compatibility. 32-bit compatibility files will not be installed; if you wish to install them, re-run the installation and set a valid directory with the --compat32-libdir option.
-> Will install GLVND GLX client libraries.
-> Will install GLVND EGL client libraries.
-> Skipping GLX non-GLVND file: "libGL.so.418.87.01"
-> Skipping GLX non-GLVND file: "libGL.so.1"
-> Skipping GLX non-GLVND file: "libGL.so"
-> Skipping EGL non-GLVND file: "libEGL.so.418.87.01"
-> Skipping EGL non-GLVND file: "libEGL.so"
-> Skipping EGL non-GLVND file: "libEGL.so.1"
-> Parsing log file:
-> error.
ERROR: Error while parsing line 680 of '/var/lib/nvidia/log'.
ERROR: Uninstallation failed.
Looking for install checker script at ./libglvnd_install_checker/check-libglvnd-install.sh
executing: '/bin/sh ./libglvnd_install_checker/check-libglvnd-install.sh'...
Checking for libglvnd installation.
Checking libGLdispatch...
Checking libGLdispatch dispatch table
Checking call through libGLdispatch
All OK
libGLdispatch is OK
Checking for libGLX
libGLX is OK
Checking for libEGL
Can't load libEGL from libEGL.so.1: libEGL.so.1: cannot open shared object file: No such file or directory
Checking entrypoint library libOpenGL.so.0
Checking call through libGLdispatch
Checking call through library libOpenGL.so.0
All OK
Entrypoint library libOpenGL.so.0 is OK
Checking entrypoint library libGL.so.1
Checking call through libGLdispatch
Checking call through library libGL.so.1
dlopen("libGL.so.1") failed: libGL.so.1: cannot open shared object file: No such file or directory
Found libglvnd libraries: libOpenGL.so.0 libGLX.so.0 libGLdispatch.so.0
Missing libglvnd libraries: libGL.so.1 libEGL.so.1
-> An incomplete installation of libglvnd was found. Do you want to install a full copy of libglvnd? This will overwrite any existing libglvnd libraries. (Answer: Abort installation.)
ERROR: Installation has failed. Please see the file '/var/log/nvidia-installer.log' for details. You may find suggestions on fixing installation problems in the README available on the Linux driver download page at www.nvidia.com.