Yet another dockerfile

kai · June 14, 2018, 7:24am

No problem! I don’t know if FAT32 can be an issue, the images are definitely pretty big (more than 4 gb). I personally would recommend installing it on a hard disk, it’s also going to be faster when you read images from disk. The driver on windows does not matter, when nvidia-smi outputs the list, you should be ready to go. Do you run into any error while following the tutorial?

Hugues1965 · June 14, 2018, 7:47am

ok,
i will create a partition on one of my drive and install Ubuntu fresh there,
then will report error messages here if any.

thanks again,

Hugues1965 · June 14, 2018, 11:38am

searching on Google, there are different ways to install/run Ubuntu 16.04 under windows 10
i found this one:

is this compatible with what we want to do with docker/fastai afterwards ?
do i need the Ubuntu terminal window only ? or i need the full Ubuntu desktop ? How did you install it yourself ?

kai · June 14, 2018, 1:10pm

You definitely need a full Ubuntu system installed on your machine.It’s not working with virtual machines or the Ubuntu subsystem, because there is no way to pass through the GPU.

tthrift · June 14, 2018, 5:58pm

@kai Thanks for sharing your dockerfile and time.

The fastai tests and the lesson 1 notebook all ran fine for me when I built an image from your dockerfile.

To make it easier to poke around inside the container, I modified the dockerfile to get a terminal when the container starts.
So, for now I start the notebook using an alias placed in the container’s bashrc.
It is convenient to just ctrl-c to stop the notebook.

There is probably a more elegant way to get shell access and automatically start the notebook.
One such approach might be using attach.
My first attempt at attaching to the container from another terminal just mirrored the first session where I had manually started the notebook.

I have given up for now on trying to eliminate the Nvidia HDMI audio driver that takes over 8 of the 16 PCIE lanes on slot 1.
I rebuilt ubuntu on my compute box a couple of different ways.
After upgrading the motherboard BIOS, I found no option that helped.
The card just seems to want it that way.

Anyways, I yesterday I started Lesson 1.

Have you tried jupyter lab in lieu of jupyter notebook?
Supposedly it is now pretty stable.

Hugues1965 · June 14, 2018, 6:20pm

ok, progress, i’m almost there i think.

I have partitioned my M.2 SSD drive to install Ubuntu 16.04.
When i enter nvidia-smi in terminal, i get the usual table with details (Driver Version: 384.130).
So i started to follow your tutorial, all fine until that line:

#Install nvidia-docker2 and reload the Docker daemon configuration
apt-get install -y nvidia-docker2

I need to put sudo in front first or else i can’t run it, but still i get these errors:

Reading package lists… Done
Building dependency tree
Reading state information… Done
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:

The following packages have unmet dependencies:
nvidia-docker2 : Depends: docker-ce (= 18.03.1~ce-0~ubuntu) but 18.05.0~ce~3-0~ubuntu is to be installed or
docker-ee (= 18.03.1~ee-0~ubuntu) but it is not installable
E: Unable to correct problems, you have held broken packages.

All previous commands in your instructions returned ok, i had to place sudo in front of some of them.

I can google that error above but prefer to wait for a while.

tthrift · June 14, 2018, 6:34pm

@Hugues1965 In case it might be helpful to you.
These are my notes from installing docker and nvidia docker 2.

I used these after nvidia-smi showed that my nvidia driver was well.
However, I did this on an 18.04 LTS server with nvidia-headless-390 installed.

Good luck getting things up and running smoothly.

install docker - found at https://docs.docker.com/engine/userguide/

sudo apt update
sudo apt install docker.io
sudo systemctl start docker
sudo systemctl enable docker
docker --version
	==> 17.12.1-ce

test

sudo docker run --rm hello-world

remove need to use sudo for docker

sudo groupadd docker
sudo user mod -aG docker $USER

log out and back in to re-evauate your group membership

docker run --rm hello-world

install Nvidia Docker v2 - found at https://github.com/NVIDIA/nvidia-docker

Add the package repositories

curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update

Install nvidia-docker2 and reload the Docker daemon configuration

sudo apt-get install -y nvidia-docker2
sudo pkill -SIGHUP dockerd

Test nvidia-docker2

sudo nvidia-container-cli --load-kmods info

run interactive

docker run --runtime=nvidia --rm -it nvidia/cuda

at container prompt

	nvidia-smi
	exit

kai · June 14, 2018, 6:45pm

@tthrift To acces the container you can also do the following: type docker ps on your host system. That will show you all running containers. The following command will open a new shell inside the container: docker exec -it b9c /bin/bash . The string b9c are the first 3 letters of the container ID. You just need to type as many letters as needed to identify a unique container. So just b would also be enough since I’m running only this one container.

@Hugues1965 It seems that docker is not installed correctly. What happens if you type docker info in a terminal and hit enter?

Hugues1965 · June 14, 2018, 6:50pm

docker info gives me:

Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Get http://%2Fvar%2Frun%2Fdocker.sock/v1.37/info: dial unix /var/run/docker.sock: connect: permission denied

sudo docker info gives me:

Containers: 0
Running: 0
Paused: 0
Stopped: 0
Images: 0
Server Version: 18.05.0-ce
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 773c489c9c1b21a6d78b5c538cd395416ec50f88
runc version: 4fc53a81fb7c994640722ac585fa9ca548971871
init version: 949e6fa
Security Options:
apparmor
seccomp
Profile: default
Kernel Version: 4.13.0-45-generic
Operating System: Ubuntu 16.04.4 LTS
OSType: linux
Architecture: x86_64
CPUs: 32
Total Memory: 125.9GiB
Name: hugues-MS-7B09
ID: 3Q7I:IFU6:…
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false

WARNING: No swap limit support

WHen I run : sudo docker run --rm hello-world
system says:

Hello from Docker!
This message shows that your installation appears to be working correctly.

My educated newb guess is that nvidia docker 2 expects docker-ce 18.03.1~ce but i have 18.05.0-ce ? Should i follow Terry’s instructions to install nvidia docker 2 ?

tthrift · June 14, 2018, 7:13pm

perhaps you just need to

remove need to use sudo for docker

sudo groupadd docker
sudo user mod -aG docker $USER

log out and back in to re-evauate your group membership

docker run --rm hello-world

Hugues1965 · June 14, 2018, 7:24pm

sudo grouped docker
returns:

sudo: grouped: command not found

tthrift · June 14, 2018, 7:43pm

sorry. my bad. When I proofed my notes I missed groupadd.
corrected. thx

Hugues1965 · June 15, 2018, 3:50pm

hi @tthrift

sudo groupadd docker
returns:

groupadd: group ‘docker’ already exists

sudo user mod -aG docker $USER
returns:

sudo: user: command not found

Do i need to type it like this or i need to change one of the 2 “user” for my user name ?

tthrift · June 15, 2018, 4:30pm

@Hugues1965 On my system a docker group already existed as well. Afaik, that should be ok. Apparently, $USER is a command line convenience that grabs the name of the currently logged in user from the environment. Any and all users that you add to the docker user group should not have to use sudo when calling docker. The links that I included give additional information.

Hugues1965 · June 15, 2018, 5:39pm

In order to get rid of the blocking message:

The following packages have unmet dependencies:
nvidia-docker2 : Depends: docker-ce (= 18.03.1~ce-0~ubuntu) but 18.05.0~ce~3-0~ubuntu is to be installed or
docker-ee (= 18.03.1~ee-0~ubuntu) but it is not installable

I’m told by Nvidia-docker admin that edge releases are not supported and to disable the edge channel from your Docker repository entry in /etc/apt/sources.list or /etc/apt/sources.list.d/…
exchange is here:

I did replace edge by stable in that file but the error message remains the same, !?!
i would really like to move on this week-end with Fastai course and tests…

Hugues1965 · June 15, 2018, 8:36pm

i finally managed to install without docker by following the fastai papersspace script on this page:
http://files.fast.ai/setup/paperspace

went flawlessly,

tthrift · June 15, 2018, 9:02pm

Congratulations!
You got it done. Excellent.

kai · June 17, 2018, 10:15am

Congrats! I guess the docker version is more complicated, the reason I use it is, that I want to isolate other environments from the pytorch/fast.ai installation. Have fun with the course!

Hugues1965 · June 17, 2018, 10:41am

i would have liked to use Docker too for the same reason, but after 3 weeks i had to get going.

But i will revisit it, i think the solution to my problem is simple, according to Docker admin, i had to remove my Docker CE installation first (it was the edge version) and re-install the Stable version.

But not sure why i ended up with the edge version, indeed my Docker.list file contained edge instead of stable.

Issue resolution is here for these who bump into the same thing:

mindtrinket · September 17, 2018, 4:05am

I have got everything working with this docker file, so it is loading cats! Hooray!
09%20PM

However,
I quickly run into a problem as I start in with the Convlearner. It downloads the resnet34 file correctly into the container, but it keeps throwing the same error (even after rebuilding).

UnicodeDecodeError: ‘utf-8’ codec can’t decode byte 0xff in position 30: invalid start byte

Any hints? I am guessing there is data it is not seeing on the host or in the container.

Edit: I needed to run git pull as suggested by this thread.