However, after rebooting the instance same problem still exist.
Then I followed the instruction provided in the link http://www.linuxandubuntu.com/home/how-to-install-latest-nvidia-drivers-in-linux that you provided. However, after following those steps and rebooting the instance the lsmod | grep nvidia command is giving me no output. As per the post, it suggests installation has failed. As usual nvidia-smi is not working as expected.
I wonder why the installation failed. Did the apt-get install cuda-drivers show errors?
I googled what lsmod does:
“It shows which loadable kernel modules are currently loaded.”
Maybe it did install but it isn’t loaded yet. Try doing the apt-get install cuda-drivers again and if it was successfully installed it should say that the package is already installed.
I had a somewhat similar problem when I installed the driver on my Ubuntu system. I think the problem might be that the driver is not in use.
I tried re-installing everything following http://www.linuxandubuntu.com/home/how-to-install-latest-nvidia-drivers-in-linux, to look into the logs what’s exactly going on. Like previously instillation didn’t work. I also noticed few error messages related to kernel while installing, but can’t really figure out what it is all about. I am attaching the logs…
sudo apt-get install nvidia-390
Reading package lists… Done
Building dependency tree
Reading state information… Done
The following packages were automatically installed and are no longer required:
libllvm4.0 linux-aws-headers-4.4.0-1022 linux-headers-4.4.0-1022-aws linux-image-4.4.0-1022-aws linux-image-4.4.0-1038-aws
Use ‘sudo apt autoremove’ to remove them.
The following additional packages will be installed:
libcuda1-390 nvidia-opencl-icd-390 nvidia-prime nvidia-settings
The following NEW packages will be installed:
libcuda1-390 nvidia-390 nvidia-opencl-icd-390 nvidia-prime nvidia-settings
0 upgraded, 5 newly installed, 0 to remove and 6 not upgraded.
Need to get 0 B/81.5 MB of archives.
After this operation, 362 MB of additional disk space will be used.
Do you want to continue? [Y/n] y
Selecting previously unselected package nvidia-390.
(Reading database … 261447 files and directories currently installed.)
Preparing to unpack …/nvidia-390_390.59-0ubuntu0~gpu16.04.1_amd64.deb …
Unpacking nvidia-390 (390.59-0ubuntu0~gpu16.04.1) …
Selecting previously unselected package libcuda1-390.
Preparing to unpack …/libcuda1-390_390.59-0ubuntu0~gpu16.04.1_amd64.deb …
Unpacking libcuda1-390 (390.59-0ubuntu0~gpu16.04.1) …
Selecting previously unselected package nvidia-opencl-icd-390.
Preparing to unpack …/nvidia-opencl-icd-390_390.59-0ubuntu0~gpu16.04.1_amd64.deb …
Unpacking nvidia-opencl-icd-390 (390.59-0ubuntu0~gpu16.04.1) …
Selecting previously unselected package nvidia-prime.
Preparing to unpack …/nvidia-prime_0.8.2_amd64.deb …
Unpacking nvidia-prime (0.8.2) …
Selecting previously unselected package nvidia-settings.
Preparing to unpack …/nvidia-settings_396.24-0ubuntu0~gpu16.04.1_amd64.deb …
Unpacking nvidia-settings (396.24-0ubuntu0~gpu16.04.1) …
Processing triggers for libc-bin (2.23-0ubuntu10) …
Processing triggers for ureadahead (0.100.0-19) …
Processing triggers for man-db (2.7.5-1) …
Processing triggers for gnome-menus (3.13.3-6ubuntu3.1) …
Processing triggers for desktop-file-utils (0.22-1ubuntu5.1) …
Processing triggers for mime-support (3.59ubuntu1) …
Setting up nvidia-390 (390.59-0ubuntu0~gpu16.04.1) …
update-alternatives: using /usr/lib/nvidia-390/ld.so.conf to provide /etc/ld.so.conf.d/x86_64-linux-gnu_GL.conf (x86_64-linux-gnu_gl_conf) in auto mode
update-alternatives: using /usr/lib/nvidia-390/ld.so.conf to provide /etc/ld.so.conf.d/x86_64-linux-gnu_EGL.conf (x86_64-linux-gnu_egl_conf) in auto mode
update-alternatives: using /usr/lib/nvidia-390/alt_ld.so.conf to provide /etc/ld.so.conf.d/i386-linux-gnu_GL.conf (i386-linux-gnu_gl_conf) in auto mode
update-alternatives: using /usr/lib/nvidia-390/alt_ld.so.conf to provide /etc/ld.so.conf.d/i386-linux-gnu_EGL.conf (i386-linux-gnu_egl_conf) in auto mode
update-alternatives: using /usr/share/nvidia-390/glamor.conf to provide /usr/share/X11/xorg.conf.d/glamoregl.conf (glamor_conf) in auto mode
update-initramfs: deferring update (trigger activated)
update-initramfs: Generating /boot/initrd.img-4.4.0-59-generic
WARNING: missing /lib/modules/4.4.0-59-generic
Ensure all necessary drivers are built into the linux image!
depmod: ERROR: could not open directory /lib/modules/4.4.0-59-generic: No such file or directory
depmod: FATAL: could not search modules: No such file or directory
W: mdadm: /etc/mdadm/mdadm.conf defines no arrays.
depmod: WARNING: could not open /var/tmp/mkinitramfs_rG1dM5/lib/modules/4.4.0-59-generic/modules.order: No such file or directory
depmod: WARNING: could not open /var/tmp/mkinitramfs_rG1dM5/lib/modules/4.4.0-59-generic/modules.builtin: No such file or directory
Adding system user nvidia-persistenced' (UID 116) ... Adding new groupnvidia-persistenced’ (GID 122) …
Adding new user nvidia-persistenced' (UID 116) with groupnvidia-persistenced’ …
Not creating home directory `/’.
Loading new nvidia-390-390.59 DKMS files…
First Installation: checking all kernels…
It is likely that 4.4.0-59-generic belongs to a chroot’s host
Building only for 4.4.0-1057-aws
Building for architecture x86_64
Building initial module for 4.4.0-1057-aws
Done.
nvidia_390:
Running module version sanity check.
Original module
No original module exists within this kernel
Installation
Installing to /lib/modules/4.4.0-1057-aws/updates/dkms/
nvidia_390_modeset.ko:
Running module version sanity check.
Original module
No original module exists within this kernel
Installation
Installing to /lib/modules/4.4.0-1057-aws/updates/dkms/
nvidia_390_drm.ko:
Running module version sanity check.
Original module
No original module exists within this kernel
Installation
Installing to /lib/modules/4.4.0-1057-aws/updates/dkms/
nvidia_390_uvm.ko:
Running module version sanity check.
Original module
No original module exists within this kernel
Installation
Installing to /lib/modules/4.4.0-1057-aws/updates/dkms/
depmod…
DKMS: install completed.
Setting up libcuda1-390 (390.59-0ubuntu0~gpu16.04.1) …
Setting up nvidia-opencl-icd-390 (390.59-0ubuntu0~gpu16.04.1) …
Setting up nvidia-prime (0.8.2) …
Setting up nvidia-settings (396.24-0ubuntu0~gpu16.04.1) …
Processing triggers for libc-bin (2.23-0ubuntu10) …
Processing triggers for initramfs-tools (0.122ubuntu8.11) …
update-initramfs: Generating /boot/initrd.img-4.4.0-1057-aws
W: mdadm: /etc/mdadm/mdadm.conf defines no arrays.
Processing triggers for ureadahead (0.100.0-19) …
I had this same problem when I built my own server. I followed these instructions (kind of, I mostly used it to see the dependencies required). Using sudo apt-get install nvidia-390 never worked for me, and I wound up reinstalling ubuntu fresh about 4 times before figuring out the correct sequence. In my experience it’s better to just install the CUDA toolkit which takes care of everything. Here’s where to find what distro you need. If I recall correctly, I believe I did the following (assuming ubuntu 16.04 and an x86 architecture):
I have figured out a workaround for this issue by switching into Amazon deep learning AMI instead of fastai AMI. torch.cuda.is_available() is returning True.
However, only downside is that I had to download all fastai contents from github, but that is okay for me.
I am using paperspace and facing the original issue : torch.cuda.is_available() returns False. And when I run nvidia-smi on the machine, I get “No Devices Were Found”.
Hey yes. This issue is fixed. I contacted their customer support. And restarted the VM a couple of times (suggested by the support staff), and it worked like a charm.
Actually paperspace has been having some serious technical problems. From last couple of days I couldn’t run stable VM for more than 2h, or GPU brakes, or they run out of space, or I cannot start it at all. I’m seriously considering moving to Amazon or setting up something for my own…
AWS is definitely is a very good option to consider. If I consider spot instances then AWS is the cheapest option available as of now. I further minimize the cost by switching to t2 instances when I am not training the model. Now, to make the storage persistent I am swapping my root volume.
You can use the following fastai wiki link for dealing with spot instances:
Hi, I am having a lot of trouble with the ‘torch.cuda.is_available() >False’ problem and would appreciate some help. I have a windows10 pc and started a Paperspace account and loaded all of the files as per the instructions, everything seems to have worked fine except the GPU doesn’t run.
Can you please tell me from where do you run ‘apt-get install cuda-drivers’
I tried running it from the Paperspace machine console but got this message -
(fastai) paperspace@psphyed93:~ pwd
/home/paperspace
(fastai) paperspace@psphyed93:~ apt-get install cuda-drivers
E: Could not open lock file /var/lib/dpkg/lock - open (13: Permission denied)
E: Unable to lock the administration directory (/var/lib/dpkg/), are you root?
Problem - ‘torch.cuda.is_available()’ returns ‘False’
I contacted Paperspace support and this was their reply -
It is a known issue that we’ve fixed on all future machines that are created, but you can fix it on your existing VM by running: sudo apt update && sudo apt upgrade
When you are prompted to update minions/salt, make sure you choose N. That should resolve this for you!