Hello community. Over the last two days I managed to set up my own machine learning … machine.
I simply want to share my steps, they may be useful for someone.
Please note that I am a newbie to machine learning (as of jan 2021). I can’t give any solid advice, even though I’d love to. This guide is aimed at peope that don’t really know what to do (same as me ).
First, credits where they should be. Big “Thank You” to the authors of the pages below. I could finish this setup by following their advice.
Santosh KS
Build your own Deep Learning + Ubuntu Machine | by santhosh K.S | Medium
Isaac Kimsey
Nvidia installation guide
My PC is nothing special, I use small SSD to boot the system and larger HDD for files, games, movies etc. Graphics is GTX 1060 6GB. Processor is medium-high intel.
Step 0 :
Before start, I went to to nvidia installation guide above and check supported video cards.
I was wondering if I should install ubuntu 18 LTS or 20 LTS. People were saying that 20 is working good, so I wanted to get the latest version, but on nvidia guide I found that 20.04 have limited validation (whatever that means) so I went with the safer option.
Step 1 :
Disconnected power to my video card, as this should make rest of the installation easier. I have no idea why. If there is someone here that wants to share their knowledge - you are welcome.
Step 2 :
Installed ubuntu to SSD. I reserved about 50GB of space for the system.
sudo apt-get update
sudo apt-get upgrade
Commands above update and upgrade system.
Step 3 :
Installed libraries
sudo apt-get install vim csh flex gfortran libgfortran3 g++ \
cmake xorg-dev patch zlib1g-dev libbz2-dev \
libboost-all-dev openssh-server libcairo2 \
libcairo2-dev libeigen3-dev lsb-core \
lsb-base net-tools network-manager \
git-core git-gui git-doc xclip gdebi-core
Command above is installing some general packages needed for coding, ML, etc.
It takes care of gcc compiler too.
gcc --version
Command above verifies that I have gcc installed. It is required for development using CUDA. It is expected to be installed automaticaly in linux, but was missing in my case (or command above was searching in wrong place).
Step 4 :
I validated that my compiler and boostLib are working properly.
Create a file called ‘test_boost.cpp’ and add the following code:
#include <boost/lambda/lambda.hpp>
#include <iostream>
#include <iterator>
#include <algorithm>
int main()
{
using namespace boost::lambda;
typedef std::istream_iterator<int> in;
std::for_each(
in(std::cin), in(), std::cout << (_1 * 3) << " " );
}
Compile and execute the program.
g++-7 test_boost.cpp -o test_boost
echo 3 6 9 | ./test_boost
The output should be 9 18 27
Above instructions are copy-pasted from IsaacJK article.
Output was correct in my case.
Step 5 :
Set up kernel headers and development packages for CUDA.
My basic understanding based on reading a few pages… Programs and linux kernel are communicating by interfaces, and kernel headers tells how programs should communicate (which version to use). If I am wrong, please correct me, I am but a humble beginner.
uname -r
Above command show the kernel version. Both kernel headers AND development packages need to match it. When upgrade, install something or whatever else, versions can change. Remember to check if kernel headers and development packages are the same version after those operations.
sudo apt-get install linux-headers-$(uname -r)
Above command installs correct version of both.
Step 6 :
Now I installed CUDA itself.
In link above there are options about system, architecture and types of installation.
In my case it was ubuntu 18.04, x86_64, deb(network).
If you choose other installation method steps below are different. You can find them in nvidia installation guide.
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-ubuntu1804.pin
sudo mv cuda-ubuntu1804.pin /etc/apt/preferences.d/cuda-repository-pin-600
sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub
sudo add-apt-repository "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/ /"
sudo apt-get update
sudo apt-get -y install cuda
Commands above download and installs cuda. They are based on your choices before, so steps below are good only in this case.
Optional (important) step :
Turn off/disable nouveau drivers. Those drivers were not present in my case, so I skipped it. You can check this by using command lsmod | grep nouveau
Create a new grub.conf file
Edit the file “/etc/default/grub”
vim /etc/default/grub
add the line
GRUB_CMDLINE_LINUX=”modprobe.blacklist=nouveau nouveau.modeset=0 iommu=soft”
sudo update-grub
Step 7 :
Turn off, connect video card, turn on.
Step 8 :
Set up PATH variable.
export PATH=/usr/local/cuda-11.2/bin${PATH:+:${PATH}}
Command above adds cuda bins to $PATH
echo $PATH
After running command above there should be something like /usr/local/cuda-11.2/bin:
at the beginnig
In my case I need to repeat this step after each reboot. I don’t know why (yet).
Each terminal instance will have unmodified $PATH. To make the change permanent
gedit ~/.bashrc
and add the line : export PATH="/usr/local/cuda-11.2/bin:$PATH"
Step 9 :
Power9 setup.
This step may be unnecessary, as my CPU is not POWER9, but I followed nvidia reccomended steps anyway.
systemctl status nvidia-persistenced
Command above checks if nvidia persistenced is enabled. In my case it shows something like Loaded: loaded (/lib/systemd/system/nvidia-persistenced.service; enabled;
if it is not enabled …
sudo systemctl enable nvidia-persistenced
Step 10 :
Disable udev rule:
I will be honest - I have no idea (as of now) what it does.
You can read about it here if you want to.
Location of the file is : /lib/udev/rules.d/40-vm-hotadd.rules
Rule that is need to be turned off: SUBSYSTEM==“memory”, ACTION==“add”, PROGRAM=“/bin/uname -p”, RESULT!=“s390*”, ATTR{state}==“offline”, ATTR{state}=“online”
sudo cp /lib/udev/rules.d/40-vm-hotadd.rules /etc/udev/rules.d
Command above copies the file to specified location.
sudo sed -i '/SUBSYSTEM=="memory", ACTION=="add"/d' /lib/udev/rules.d/40-vm-hotadd.rules
Command above disables memory rule, as nvidia instructions suggested.
Step 11 :
Installing persistence daemon :
/usr/bin/nvidia-persistenced --verbose
Command above needs to be run when booting.
To do this add it to /etc/rc.local
file.
There was no said file on my system, so I touch /etc/rc.local
. Maybe because it was a fresh install, I don’t know.
edit:
vim /etc/rc.local
add:
/usr/bin/nvidia-persistenced --verbose
Step 12 :
Reboot machine, verify installation.
cat /proc/driver/nvidia/version
Command above checks driver version. In my case it shows:
NVRM version: NVIDIA UNIX x86_64 Kernel Module 460.27.04 Fri Dec 11 23:35:05 UTC 2020
GCC version: gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04)
nvcc -V
Command above shows nvidia cuda compiler version. In my case it shows:
Cuda compilation tools, release 11.2, V11.2.67
Build cuda_11.2.r11.2/compiler.29373293_0
Step 13 :
Install and run some samples.
This step is completely optional, but it was mentioned in nvidia guide, so I followed.
cuda-install-samples-11.2.sh <dir>
Above command installs samples in specified directory.
For example : cuda-install-samples-11.2.sh /cudaSamples
sudo apt-get install freeglut3-dev build-essential libx11-dev libxmu-dev libxi-dev libgl1-mesa-glx libglu1-mesa libglu1-mesa-dev libglfw3-dev libgles2-mesa-dev
Above command installs some additional libraries for samples to work. Keep in mind that these can vary in different installations/systems. Check error logs when using make
command below.
Find Makefile in directory you installed samples.
vim ~/Makefile
edit line 41:
FILTER_OUT := 0_Simple/cudaNvSci/Makefile
To avoid some errors/warnings when using make command.
Now, use make
in samples directory.
You can use
make -k
or
GLPATH=/usr/lib make -k
Those two are not the same. I don’t know specifics.
-k flag is needed to continue even after some errors. Nvidia guys says that errors are completely normal thing right now and they are there because something wrong with samples. You can carry on without worrying about those. (as of january 2021)
Samples compile to ~/bin folder.
Now cd
to ~/NVIDIA_CUDA-11.2_Samples/1_Utilities/deviceQuery
and run command
./deviceQuery
Above command gives info about CUDA capable devices. Last lines from my result:
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 11.2, CUDA Runtime Version = 11.2, NumDevs = 1
Result = PASS
Now cd
to ~/NVIDIA_CUDA-11.2_Samples/1_Utilities/bandwidthTest
and run command
./bandwidthTest
Above command should show your, well… bandwith.
Device to Device Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(GB/s)
32000000 156.5
Result = PASS
And so this is the result of 2 days of my tinkering and head - scratching. Once again - credits to Santosh KS and Isaac Kimsey.
That is all. How to set up jupyter and everything else incoming in a few days, I hope. Happy computing. : )