Hey all! I just wanted to share a dockerfile I wrote for my workstation and a little tutorial how to use it. I know that there are already some dockerfiles out there, but this one uses a pre-build pytorch image from nvidia. All the nitty-gritty stuff is done by nvidia, so it’s mostly just installing the fast.ai library on top of that. There’s also a README with some advise and an explanation why I prefer docker over isolation with virtual environments.
On your host system (the OS on your computer) you just need to install the latest driver for your GPU. What do you see, when you open a terminal and type: nvidia-smi? You don’t need to install any CUDA or CuDNN stuff. It’s all done automatically inside the container. Did you try to stick to the tutorial in the repo?
I follow your tutorial. There is just a part that is not clear to me then, from the Prerequisites section, you say we need to have Linux installed and the GPU drivers installed. So that’s why i’m trying to install them.
But tell me, how do you run Linux on your system ? So far i’ve been installing it on a USB stick and booting from it, with a persistent partition, works nicely (the host PC runs windows 10, i have not checked the Nvidia drivers, but GPU runs smoothly when my kid’s playing games) When i open Terminal from my Ubuntu installation and type nvidia-smi, i see the usual table with details about my card. I can post a printscreen if needed.
Maybe i should install Ubuntu on a partition of one of my disk ? Because yesterday it looked like my USB (62 gb) was running out of space for some reason, it seems FAT32 can only handle 4 gig).
Sorry, first time i try to install Linux. Thanks for your precious help,
No problem! I don’t know if FAT32 can be an issue, the images are definitely pretty big (more than 4 gb). I personally would recommend installing it on a hard disk, it’s also going to be faster when you read images from disk. The driver on windows does not matter, when nvidia-smi outputs the list, you should be ready to go. Do you run into any error while following the tutorial?
The fastai tests and the lesson 1 notebook all ran fine for me when I built an image from your dockerfile.
To make it easier to poke around inside the container, I modified the dockerfile to get a terminal when the container starts.
So, for now I start the notebook using an alias placed in the container’s bashrc.
It is convenient to just ctrl-c to stop the notebook.
There is probably a more elegant way to get shell access and automatically start the notebook.
One such approach might be using attach.
My first attempt at attaching to the container from another terminal just mirrored the first session where I had manually started the notebook.
I have given up for now on trying to eliminate the Nvidia HDMI audio driver that takes over 8 of the 16 PCIE lanes on slot 1.
I rebuilt ubuntu on my compute box a couple of different ways.
After upgrading the motherboard BIOS, I found no option that helped.
The card just seems to want it that way.
Anyways, I yesterday I started Lesson 1.
Have you tried jupyter lab in lieu of jupyter notebook?
Supposedly it is now pretty stable.
I have partitioned my M.2 SSD drive to install Ubuntu 16.04.
When i enter nvidia-smi in terminal, i get the usual table with details (Driver Version: 384.130).
So i started to follow your tutorial, all fine until that line:
#Install nvidia-docker2 and reload the Docker daemon configuration
apt-get install -y nvidia-docker2
I need to put sudo in front first or else i can’t run it, but still i get these errors:
Reading package lists… Done
Building dependency tree
Reading state information… Done
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:
The following packages have unmet dependencies:
nvidia-docker2 : Depends: docker-ce (= 18.03.1~ce-0~ubuntu) but 18.05.0~ce~3-0~ubuntu is to be installed or
docker-ee (= 18.03.1~ee-0~ubuntu) but it is not installable
E: Unable to correct problems, you have held broken packages.
All previous commands in your instructions returned ok, i had to place sudo in front of some of them.
I can google that error above but prefer to wait for a while.
@tthrift To acces the container you can also do the following: type docker ps on your host system. That will show you all running containers. The following command will open a new shell inside the container: docker exec -it b9c /bin/bash . The string b9c are the first 3 letters of the container ID. You just need to type as many letters as needed to identify a unique container. So just b would also be enough since I’m running only this one container.
Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Get http://%2Fvar%2Frun%2Fdocker.sock/v1.37/info: dial unix /var/run/docker.sock: connect: permission denied
@Hugues1965 On my system a docker group already existed as well. Afaik, that should be ok. Apparently, $USER is a command line convenience that grabs the name of the currently logged in user from the environment. Any and all users that you add to the docker user group should not have to use sudo when calling docker. The links that I included give additional information.