Ubuntu 18.04.05 + gtx 1060 6GB + CUDA 11.2. Clean install guide

Hello community. Over the last two days I managed to set up my own machine learning … machine.
I simply want to share my steps, they may be useful for someone.
Please note that I am a newbie to machine learning (as of jan 2021). I can’t give any solid advice, even though I’d love to. This guide is aimed at peope that don’t really know what to do (same as me :wink: ).


First, credits where they should be. Big “Thank You” to the authors of the pages below. I could finish this setup by following their advice.

Santosh KS

https://medium.com/@santhoshachar08/build-your-own-deep-learning-machine-e6e2e3940765

Isaac Kimsey

https://medium.com/@IsaacJK/setting-up-a-ubuntu-18-04-1-lts-system-for-deep-learning-and-scientific-computing-fab19f7ca39d

Nvidia installation guide

https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#pre-installation-actions




My PC is nothing special, I use small SSD to boot the system and larger HDD for files, games, movies etc. Graphics is GTX 1060 6GB. Processor is medium-high intel.


Step 0 :
Before start, I went to to nvidia installation guide above and check supported video cards.

I was wondering if I should install ubuntu 18 LTS or 20 LTS. People were saying that 20 is working good, so I wanted to get the latest version, but on nvidia guide I found that 20.04 have limited validation (whatever that means) so I went with the safer option.

Step 1 :
Disconnected power to my video card, as this should make rest of the installation easier. I have no idea why. If there is someone here that wants to share their knowledge - you are welcome.


Step 2 :
Installed ubuntu to SSD. I reserved about 50GB of space for the system.

sudo apt-get update
sudo apt-get upgrade

Commands above update and upgrade system.

Step 3 :
Installed libraries

sudo apt-get install vim csh flex gfortran libgfortran3 g++ \
                     cmake xorg-dev patch zlib1g-dev libbz2-dev \
                     libboost-all-dev openssh-server libcairo2 \
                     libcairo2-dev libeigen3-dev lsb-core \
                     lsb-base net-tools network-manager \
                     git-core git-gui git-doc xclip gdebi-core

Command above is installing some general packages needed for coding, ML, etc.
It takes care of gcc compiler too.

gcc --version

Command above verifies that I have gcc installed. It is required for development using CUDA. It is expected to be installed automaticaly in linux, but was missing in my case (or command above was searching in wrong place).


Step 4 :
I validated that my compiler and boostLib are working properly.

Create a file called ‘test_boost.cpp’ and add the following code:

#include <boost/lambda/lambda.hpp>
#include <iostream>
#include <iterator>
#include <algorithm>

int main()
{
    using namespace boost::lambda;
    typedef std::istream_iterator<int> in;

    std::for_each(
        in(std::cin), in(), std::cout << (_1 * 3) << " " );
}

Compile and execute the program.

g++-7 test_boost.cpp -o test_boost
echo 3 6 9 | ./test_boost

The output should be 9 18 27

Above instructions are copy-pasted from IsaacJK article.
Output was correct in my case.

Step 5 :
Set up kernel headers and development packages for CUDA.

My basic understanding based on reading a few pages… Programs and linux kernel are communicating by interfaces, and kernel headers tells how programs should communicate (which version to use). If I am wrong, please correct me, I am but a humble beginner.

uname -r

Above command show the kernel version. Both kernel headers AND development packages need to match it. When upgrade, install something or whatever else, versions can change. Remember to check if kernel headers and development packages are the same version after those operations.

sudo apt-get install linux-headers-$(uname -r) 

Above command installs correct version of both.

Step 6 :
Now I installed CUDA itself.

https://developer.nvidia.com/cuda-downloads

In link above there are options about system, architecture and types of installation.
In my case it was ubuntu 18.04, x86_64, deb(network).
If you choose other installation method steps below are different. You can find them in nvidia installation guide.


wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-ubuntu1804.pin
sudo mv cuda-ubuntu1804.pin /etc/apt/preferences.d/cuda-repository-pin-600
sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub
sudo add-apt-repository "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/ /"
sudo apt-get update
sudo apt-get -y install cuda

Commands above download and installs cuda. They are based on your choices before, so steps below are good only in this case.

Optional (important) step :
Turn off/disable nouveau drivers. Those drivers were not present in my case, so I skipped it. You can check this by using command lsmod | grep nouveau

Create a new grub.conf file

Edit the file “/etc/default/grub”

vim /etc/default/grub

add the line

GRUB_CMDLINE_LINUX=”modprobe.blacklist=nouveau nouveau.modeset=0 iommu=soft”

sudo update-grub

Step 7 :
Turn off, connect video card, turn on.

Step 8 :
Set up PATH variable.

export PATH=/usr/local/cuda-11.2/bin${PATH:+:${PATH}}

Command above adds cuda bins to $PATH

echo $PATH

After running command above there should be something like /usr/local/cuda-11.2/bin: at the beginnig
In my case I need to repeat this step after each reboot. I don’t know why (yet).
Each terminal instance will have unmodified $PATH. To make the change permanent

gedit ~/.bashrc

and add the line : export PATH="/usr/local/cuda-11.2/bin:$PATH"

Step 9 :
Power9 setup.
This step may be unnecessary, as my CPU is not POWER9, but I followed nvidia reccomended steps anyway.

systemctl status nvidia-persistenced

Command above checks if nvidia persistenced is enabled. In my case it shows something like Loaded: loaded (/lib/systemd/system/nvidia-persistenced.service; enabled;

if it is not enabled …

sudo systemctl enable nvidia-persistenced

Step 10 :
Disable udev rule:
I will be honest - I have no idea (as of now) what it does.
You can read about it here if you want to.

Location of the file is : /lib/udev/rules.d/40-vm-hotadd.rules

Rule that is need to be turned off: SUBSYSTEM==“memory”, ACTION==“add”, PROGRAM="/bin/uname -p", RESULT!=“s390*”, ATTR{state}==“offline”, ATTR{state}=“online”

sudo cp /lib/udev/rules.d/40-vm-hotadd.rules /etc/udev/rules.d

Command above copies the file to specified location.

sudo sed -i '/SUBSYSTEM=="memory", ACTION=="add"/d' /lib/udev/rules.d/40-vm-hotadd.rules

Command above disables memory rule, as nvidia instructions suggested.

Step 11 :

Installing persistence daemon :

/usr/bin/nvidia-persistenced --verbose
Command above needs to be run when booting.
To do this add it to /etc/rc.local file.

There was no said file on my system, so I touch /etc/rc.local. Maybe because it was a fresh install, I don’t know.
edit:
vim /etc/rc.local
add:
/usr/bin/nvidia-persistenced --verbose


Step 12 :
Reboot machine, verify installation.

cat /proc/driver/nvidia/version

Command above checks driver version. In my case it shows:

NVRM version: NVIDIA UNIX x86_64 Kernel Module  460.27.04  Fri Dec 11 23:35:05 UTC 2020
GCC version:  gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04) 

nvcc -V

Command above shows nvidia cuda compiler version. In my case it shows:

Cuda compilation tools, release 11.2, V11.2.67
Build cuda_11.2.r11.2/compiler.29373293_0

Step 13 :
Install and run some samples.
This step is completely optional, but it was mentioned in nvidia guide, so I followed.

cuda-install-samples-11.2.sh <dir>
Above command installs samples in specified directory.
For example : cuda-install-samples-11.2.sh /cudaSamples

sudo apt-get install freeglut3-dev build-essential libx11-dev libxmu-dev libxi-dev libgl1-mesa-glx libglu1-mesa libglu1-mesa-dev libglfw3-dev libgles2-mesa-dev

Above command installs some additional libraries for samples to work. Keep in mind that these can vary in different installations/systems. Check error logs when using make command below.

Find Makefile in directory you installed samples.

vim ~/Makefile

edit line 41:

FILTER_OUT := 0_Simple/cudaNvSci/Makefile

To avoid some errors/warnings when using make command.

Now, use make in samples directory.
You can use

make -k

or

GLPATH=/usr/lib make -k

Those two are not the same. I don’t know specifics.
-k flag is needed to continue even after some errors. Nvidia guys says that errors are completely normal thing right now and they are there because something wrong with samples. You can carry on without worrying about those. (as of january 2021)

Samples compile to ~/bin folder.
Now cd to ~/NVIDIA_CUDA-11.2_Samples/1_Utilities/deviceQuery and run command

./deviceQuery

Above command gives info about CUDA capable devices. Last lines from my result:

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 11.2, CUDA Runtime Version = 11.2, NumDevs = 1
Result = PASS

Now cd to ~/NVIDIA_CUDA-11.2_Samples/1_Utilities/bandwidthTest and run command

./bandwidthTest

Above command should show your, well… bandwith.

 Device to Device Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes)	Bandwidth(GB/s)
   32000000			156.5

Result = PASS

And so this is the result of 2 days of my tinkering and head - scratching. Once again - credits to Santosh KS and Isaac Kimsey.

That is all. How to set up jupyter and everything else incoming in a few days, I hope. Happy computing. : )

5 Likes

Very nice! Do you guys still believe we should have our own setup or run on aws and these services?

I think it’s all about pros and cons.
I prefer my own machine.

Pros:
With your own setup you don’t need to worry about every minute of CPU time and internet connection.
No one can ‘ban’ you from using your own machine.
No need for subscriptions, payments, confirmations and all of that. That is a big hurdle for me.

Cons:
Computing power is not as easily expandable as in online services, where you don’t worry about motherboards, power, cooling and all that.
Setup and the problems that comes with it can be a big wall to get over.
You still pay for electricity.

So, the usual answer: it depends.