Yet another dockerfile

Hey all! I just wanted to share a dockerfile I wrote for my workstation and a little tutorial how to use it. I know that there are already some dockerfiles out there, but this one uses a pre-build pytorch image from nvidia. All the nitty-gritty stuff is done by nvidia, so it’s mostly just installing the fast.ai library on top of that. There’s also a README with some advise and an explanation why I prefer docker over isolation with virtual environments.

Have fun with the course!

8 Likes

Hey @kai

which version of Ubunto you have used successfully for this ? 16.04 ? 18.04 ?

My computer is now ready, Nvidia Titan XP

Hey @Hugues1965, I’m using 16.04 as host. But 18.04 should also work! Just post here if something is not working.

Ok, I’ve got 16.04 installed
But then running into problems after installing Nvidia drivers.
Through which way you have installed Nvidia drivers and Cuda ?
which version of driver and Cuda ?

On your host system (the OS on your computer) you just need to install the latest driver for your GPU. What do you see, when you open a terminal and type: nvidia-smi? You don’t need to install any CUDA or CuDNN stuff. It’s all done automatically inside the container. Did you try to stick to the tutorial in the repo?

Hi @kai,

thanks for your patience,

I follow your tutorial. There is just a part that is not clear to me then, from the Prerequisites section, you say we need to have Linux installed and the GPU drivers installed. So that’s why i’m trying to install them.

But tell me, how do you run Linux on your system ? So far i’ve been installing it on a USB stick and booting from it, with a persistent partition, works nicely (the host PC runs windows 10, i have not checked the Nvidia drivers, but GPU runs smoothly when my kid’s playing games) When i open Terminal from my Ubuntu installation and type nvidia-smi, i see the usual table with details about my card. I can post a printscreen if needed.

Maybe i should install Ubuntu on a partition of one of my disk ? Because yesterday it looked like my USB (62 gb) was running out of space for some reason, it seems FAT32 can only handle 4 gig).

Sorry, first time i try to install Linux. Thanks for your precious help,

No problem! I don’t know if FAT32 can be an issue, the images are definitely pretty big (more than 4 gb). I personally would recommend installing it on a hard disk, it’s also going to be faster when you read images from disk. The driver on windows does not matter, when nvidia-smi outputs the list, you should be ready to go. Do you run into any error while following the tutorial?

ok,
i will create a partition on one of my drive and install Ubuntu fresh there,
then will report error messages here if any.

thanks again,

searching on Google, there are different ways to install/run Ubuntu 16.04 under windows 10
i found this one:

is this compatible with what we want to do with docker/fastai afterwards ?
do i need the Ubuntu terminal window only ? or i need the full Ubuntu desktop ? How did you install it yourself ?

You definitely need a full Ubuntu system installed on your machine.It’s not working with virtual machines or the Ubuntu subsystem, because there is no way to pass through the GPU.

@kai Thanks for sharing your dockerfile and time.

The fastai tests and the lesson 1 notebook all ran fine for me when I built an image from your dockerfile.

To make it easier to poke around inside the container, I modified the dockerfile to get a terminal when the container starts.
So, for now I start the notebook using an alias placed in the container’s bashrc.
It is convenient to just ctrl-c to stop the notebook.

There is probably a more elegant way to get shell access and automatically start the notebook.
One such approach might be using attach.
My first attempt at attaching to the container from another terminal just mirrored the first session where I had manually started the notebook.

I have given up for now on trying to eliminate the Nvidia HDMI audio driver that takes over 8 of the 16 PCIE lanes on slot 1.
I rebuilt ubuntu on my compute box a couple of different ways.
After upgrading the motherboard BIOS, I found no option that helped.
The card just seems to want it that way.

Anyways, I yesterday I started Lesson 1.

Have you tried jupyter lab in lieu of jupyter notebook?
Supposedly it is now pretty stable.

ok, progress, i’m almost there i think.

I have partitioned my M.2 SSD drive to install Ubuntu 16.04.
When i enter nvidia-smi in terminal, i get the usual table with details (Driver Version: 384.130).
So i started to follow your tutorial, all fine until that line:

#Install nvidia-docker2 and reload the Docker daemon configuration
apt-get install -y nvidia-docker2

I need to put sudo in front first or else i can’t run it, but still i get these errors:

Reading package lists… Done
Building dependency tree
Reading state information… Done
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:

The following packages have unmet dependencies:
nvidia-docker2 : Depends: docker-ce (= 18.03.1~ce-0~ubuntu) but 18.05.0~ce~3-0~ubuntu is to be installed or
docker-ee (= 18.03.1~ee-0~ubuntu) but it is not installable
E: Unable to correct problems, you have held broken packages.

All previous commands in your instructions returned ok, i had to place sudo in front of some of them.

I can google that error above but prefer to wait for a while.

@Hugues1965 In case it might be helpful to you.
These are my notes from installing docker and nvidia docker 2.

I used these after nvidia-smi showed that my nvidia driver was well.
However, I did this on an 18.04 LTS server with nvidia-headless-390 installed.

Good luck getting things up and running smoothly.

install docker - found at https://docs.docker.com/engine/userguide/

sudo apt update
sudo apt install docker.io
sudo systemctl start docker
sudo systemctl enable docker
docker --version
	==> 17.12.1-ce
test
sudo docker run --rm hello-world
remove need to use sudo for docker
sudo groupadd docker
sudo user mod -aG docker $USER
log out and back in to re-evauate your group membership
docker run --rm hello-world

install Nvidia Docker v2 - found at https://github.com/NVIDIA/nvidia-docker

Add the package repositories
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update
Install nvidia-docker2 and reload the Docker daemon configuration
sudo apt-get install -y nvidia-docker2
sudo pkill -SIGHUP dockerd
Test nvidia-docker2
sudo nvidia-container-cli --load-kmods info
run interactive
docker run --runtime=nvidia --rm -it nvidia/cuda 
at container prompt
	nvidia-smi
	exit

@tthrift To acces the container you can also do the following: type docker ps on your host system. That will show you all running containers. The following command will open a new shell inside the container: docker exec -it b9c /bin/bash . The string b9c are the first 3 letters of the container ID. You just need to type as many letters as needed to identify a unique container. So just b would also be enough since I’m running only this one container.

@Hugues1965 It seems that docker is not installed correctly. What happens if you type docker info in a terminal and hit enter?

1 Like

docker info gives me:

Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Get http://%2Fvar%2Frun%2Fdocker.sock/v1.37/info: dial unix /var/run/docker.sock: connect: permission denied

sudo docker info gives me:

Containers: 0
Running: 0
Paused: 0
Stopped: 0
Images: 0
Server Version: 18.05.0-ce
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 773c489c9c1b21a6d78b5c538cd395416ec50f88
runc version: 4fc53a81fb7c994640722ac585fa9ca548971871
init version: 949e6fa
Security Options:
apparmor
seccomp
Profile: default
Kernel Version: 4.13.0-45-generic
Operating System: Ubuntu 16.04.4 LTS
OSType: linux
Architecture: x86_64
CPUs: 32
Total Memory: 125.9GiB
Name: hugues-MS-7B09
ID: 3Q7I:IFU6:…
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false

WARNING: No swap limit support

WHen I run : sudo docker run --rm hello-world
system says:

Hello from Docker!
This message shows that your installation appears to be working correctly.

My educated newb guess is that nvidia docker 2 expects docker-ce 18.03.1~ce but i have 18.05.0-ce ? Should i follow Terry’s instructions to install nvidia docker 2 ?

perhaps you just need to

remove need to use sudo for docker

sudo groupadd docker
sudo user mod -aG docker $USER

log out and back in to re-evauate your group membership

docker run --rm hello-world

sudo grouped docker
returns:

sudo: grouped: command not found

sorry. my bad. When I proofed my notes I missed groupadd.
corrected. thx

hi @tthrift

sudo groupadd docker
returns:

groupadd: group ‘docker’ already exists

sudo user mod -aG docker $USER
returns:

sudo: user: command not found

Do i need to type it like this or i need to change one of the 2 “user” for my user name ?

@Hugues1965 On my system a docker group already existed as well. Afaik, that should be ok. Apparently, $USER is a command line convenience that grabs the name of the currently logged in user from the environment. Any and all users that you add to the docker user group should not have to use sudo when calling docker. The links that I included give additional information.