I forgot to mention that VirtualBox or VMWare cannot do PCI Passthrough. To archive that your only way is to use containers.
Personally I use Linux Container (LXC) instead of Docker because it achieve zero latency.
Container enables you to pass the GPUs to the container. Since I discovered the toy I never installed anything on my host machine, I just configure a new environment and use the software inside the container.
Here is my basic tutorial for it.
You will need to install snapd package manager (a new way to install software in linux that nobody can temper it’s file system) To do so you need some pre-requisites.
1 - Prerequisites
NOTE: If you have Ubuntu as base OS, remove the previous LXD so you can install the lastest version from snap
1-Install the Nvidia Drivers properly: In the moment of this tutorial the nvidia-410 was the latest version.
Note: Do not use the driver from the “NVIDIA-Linux-x86_64-XXX.XXX.run” installer, it probably won’t work
sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt-get update
sudo apt upgrade
sudo apt install libcuda1-410 libxnvctrl0 nvidia-410 nvidia-410-dev nvidia-libopencl1-410 nvidia-opencl-icd-410 nvidia-settings
2 -We will need nvidia-container-runtime so the HOST can communicate with the GPUs available
curl -s -L https://nvidia.github.io/nvidia-container-runtime/gpgkey | sudo apt-key add -
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-container-runtime/$distribution/nvidia-container-runtime.list | sudo tee /etc/apt/sources.list.d/nvidia-container-runtime.list
sudo apt-get update
sudo apt-get install -yq nvidia-container-runtime
3 - Remove any older LXD from your machine (in case of Ubuntu)
This step is necessary because even in Ubuntu 18.04 is very old already (v3.0.2) and the actual is (v3.6)
sudo apt-get remove lxd lxd-client
4 - Install Snapd package manager (available in many distros, etc)
## In case of Debian distros
sudo apt install snapd
## In case of RedHat distros
sudo yum install snapd
5 - Install LXD: (from now on everything is equal in all distros)
sudo snap install lxd --channel=stable
6 - Change the REFRESH TIME to last Friday of the month to not bother you with automatic updates:
sudo snap set lxd refresh.timer=fri5,07:00-08:10
7 - Give permission to root and your user to use the container
echo "$USER:1000000:65536" | sudo tee -a /etc/subuid /etc/subgid
echo "root:1000000:65536" | sudo tee -a /etc/subuid /etc/subgid
sudo usermod --append --groups lxd $USER
Extra:
NOTE: ZFS is considered the best file system until now. It’s opensource nowadays from initiative of Sun Microsystems and Oracle and cannot come pre-installed in any system so you may need to install it manually on your system. It will be the file system used inside the container. It has many features that the current Ext4 does not have, like snapshots, self healing. I suggest or use zfs or btrfs
Use this tutorial if you want enable it ZFS on Distros
8 - Start the server (will ask a bunch of questions to define your container environment)
lxd init
9 - The questions and answers
Would you like to use LXD clustering? (yes/no) [default=no]: no
Do you want to configure a new storage pool? (yes/no) [default=yes]: yes
Name of the new storage pool [default=default]: default
if ZFS isn’t available use BTRFS
Name of the storage backend to use (btrfs, ceph, dir, lvm, zfs) [default=zfs]: zfs
Create a new ZFS pool? (yes/no) [default=yes]: yes
Would you like to use an existing block device? (yes/no) [default=no]: no
To the next question if you have 100Gb free on your DISK or SSD I recommend to put the 90Gb so you never reach the limit
Size in GB of the new loop device (1GB minimum) [default=16GB]: 50
Would you like to connect to a MAAS server? (yes/no) [default=no]: no
Would you like to create a new local network bridge? (yes/no) [default=yes]: yes
What should the new bridge be called? [default=lxdbr0]: yes
What IPv4 address should be used? (CIDR subnet notation, “auto” or “none”) [default=auto]: auto
What IPv6 address should be used? (CIDR subnet notation, “auto” or “none”) [default=auto]: none
Would you like LXD to be available over the network? (yes/no) [default=no]: no
Address to bind LXD to (not including port) [default=all]: all
Port to bind LXD to [default=8443]: 8443
Would you like stale cached images to be updated automatically? (yes/no) [default=yes]: yes
Would you like a YAML “lxd init” preseed to be printed? (yes/no) [default=no]: no
10 - May be it’s necessary to restart your machine once. To check if the server is online you can call:
lxc version
If the server is online and running you will have this output
Client version: 3.6
Server version: 3.6 ← this answer show the service was started successful
2 - Launching a Container with GPU
1 - Launching a new container, and map your local user ID with the intenal “ubuntu” user ID:
I will use the name c1 as the container name from now on
lxc launch ubuntu:16.04 c1
2 - Stop the container to do more configurations
lxc stop c1
3 - Map your UID and GID to the UID and GID of the default user of the container (user ubuntu inside the container)
Later if you map your personal folder inside the container will not get troubles with changing permissions on your files.
echo "uid $(id -u) 1000\ngid $(id -g) 1000" | lxc config set c1 raw.idmap -
4 - Map some or all GPU(s) inside the Container and pass:
This command enable your HOST driver be available inside the container, so you don’t even need to have cuda installed on your Host machine , you can just install the driver on the host
lxc config set c1 nvidia.runtime true
this command maps only the specific GPU to the container, if you need both just remove the Id
lxc config device add c1 mygpu gpu id=0
5 - Start the container again:
lxc start c1
6 - Configure your password to the default user on the container:
lxc exec c1 -- bash -c 'passwd ubuntu'
7 - Test if your GPU is working inside the container already:
lxc exec c1 -- bash -c 'nvidia-smi'
8 - Go to the console of your container (you may need hit ENTER twice).
Use ubuntu user and your password created previously,
lxc console c1
NOTE: LXD has many commands that you will need to familiarize with.
lxc --help
Description: Command line client for LXD All of LXD's features can be driven through the various commands below. For help with any of those, simply call them with --help. Usage: lxc [command] Available Commands: alias Manage command aliases cluster Manage cluster members config Manage container and server configuration options console Attach to container consoles copy Copy containers within or in between LXD instances delete Delete containers and snapshots exec Execute commands in containers export Export container backups file Manage files in containers help Help about any command image Manage images import Import container backups info Show container or server information launch Create and start containers from images list List containers move Move containers within or in between LXD instances network Manage and attach containers to networks operation List, show and delete background operations profile Manage profiles project Manage projects publish Publish containers as images remote Manage the list of remote servers rename Rename containers and snapshots restart Restart containers restore Restore containers from snapshots snapshot Create container snapshots start Start containers stop Stop containers storage Manage storage pools and volumes version Show local and remote versions Flags: --all Show less common commands --debug Show all debug messages --force-local Force using the local unix socket -h, --help Print help -q, --quiet Don't show progress information -v, --verbose Show all information messages --version Print version number Use "lxc [command] --help" for more information about a command.