Docker for fast.ai

Not sure about your virtual box setup. But even with a direct install, you need a properly installed driver before @Matthew setup instructions will work. You will know that your driver is properly installed when you see an output something like this:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 396.26                 Driver Version: 396.26                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 106...  Off  | 00000000:01:00.0  On |                  N/A |
| 41%   58C    P2    29W / 120W |   2833MiB /  6075MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1052      G   /usr/lib/xorg/Xorg                           177MiB |
|    0      2346      G   compiz                                       129MiB |
|    0      6777      C   /opt/conda/envs/fastai/bin/python           2507MiB |
|    0     12352      G   /opt/teamviewer/tv_bin/TeamViewer             14MiB |
+-----------------------------------------------------------------------------+

Hi @P_Wes

i now got rid of virtual box and made a clean instalation of Ubuntu 18.04 on USB stick.
Managed to install the 390 Nvidia driver through the Ubuntu software update.
Now when i run Nvidia-smi I get:

So now i will proceed with Docker installation.

TLDR;

Yeah!
My own part 1 server ran the first cats and dogs in a dockerfile!

Experimenting via notebooks inside task specific docker containers is absolutely the cat’s meow!

@Matthew Thanks for the dockerfiles/fastai and instructions!
When I got to that point, it worked perfectly to get me running.

@fastai-community Wow! Thanks for the zillions of posts helping each other ( and indirectly me ).
It took me a while to figure out how I wanted to approach using a personal machine.
All of your inputs were help filled.

A longer story:

A month or so ago I watched part 1.
I was hooked on fastai.

I haven’t written much code for a number of years.
So, getting up to speed will likely be slower than it once might have been. :wink:
Looking up every little thing uses a lot of time.

Given my anticipated speed I knew that I would quickly tire of paying for compute cycles by the hour.
Otherwise, I would have just used paperspace for part 1.

While researching what to build,
I found the posts of community members helping each other to be a wonderful resource.
Thanks to everyone!

Boy it sure can be a time sink going down hardware choice and software configuration rabbit holes.
It can be tricky to determine the contexts in which community posts and internet search results apply.
I can see why it is recommended to start in the cloud.

However,

My own part 1 Server is now Running Fastai !!!

As a quick test I just trained the first dogs and cats classifier in lesson 1.
So far, its working great!

Software:
dockerfiles/fastai
running on Nvidia Docker 2
on Ubuntu 18.04 LTS server with Nvidia drivers
normally boots into console for headless operation
manual invocation of Gnome for non-headless operation using CPU integrated graphics

Hardware:
single 1080Ti GPU
16GB system ram
500GB NVMe SSD (Samsung 960 Evo M.2)
installed Ubuntu here
system i7-6700K, Z170
purchased from a local gamer that was moving back to Europe
mechanical disk
installed windoze here for hardware testing/verification

The hardware is not quite as nice/flexible as the system builds I spent week(s) researching.
But this way I saved some money and spending additional time.
So far the only hardware I added was an SSD and some fans.
System ram can expand if needed and the 1080Ti will accept a watercooling modification

In any event the box seems to be performing its’ task of becoming a 1080Ti compute server!

Next up will be to establish an ssh tunnel to my laptop and some rudimentary temperature monitoring.
Oh. And finally start lesson 1!

At some point I should poke a hole in my router so I don’t have to be at home to work.

Sad I need to defer this for a week or so to tend to some other things.

I’m excited!

And Thankyou all!

-Terry-

@tthrift
cool, how long does it take for your machine to run the 3 epochs of lesson 1 ?

my gear arrives tomorrow, so next week-end i’ll start my build.

@tthrift @Hugues1965 Have fun, I always love to build my machines :star_struck: I’m running a 1080Ti with a i7-5930K and 32 GB RAM. I’m also using nvidia-docker, but I’m using a pre-build pytorch image from nvidia. The 3 epochs (directly above the cyclic LR schedule image in the notebook) take 2:33 minutes. Would be interesting to see if nvidia has used some “secret optimizing sauce” or if it’s just the same in comparison to the cuda images from docker hub!

@Hugues1965 @kai I took a break from my other stuff and saw your messages. On my machine 3 epochs of the “quickstart” model ran in 7 seconds. The cyclic learn.fit(1e-2, 3, cycle_len=1) ran in 2:52 seconds. The scaling cyclic learn.fit(lr, 3, cycle_len=1, cycle_mult=2) ran in 12:40. Kai’s machine seems to run more quickly. I might try the nvidia/pytorch image sometime on this machine to compare apples to apples. (go docker! It’s not a huge task to just try it. ) It would seem like for the highest possible performance a person might need to compile binaries for their particular machine. For now I’m just thrilled that everything in lesson 1 seems to work!!!

edit: while training the Link Status of my 1080Ti is 8x instead of 16x. Hmm. it’s in the correct slot.
sudo lspci -vv | grep -P “[0-9a-f]{2}:[0-9a-f]{2}\.[0-9a-f]|LnkSta:”

@tthrift Is the variable torch.backends.cudnn.enabled True? If you want to try the pre-build pytorch image from nvidia, I posted a tutorial on that:

1 Like

@kai Thanks for your interest and helpful input.

I may have asserted reduced bandwidth too quickly last night.
I don’t know yet.

torch.backends.cudnn.enabled ==> true
nvidia-smi finds the GPU and seems to look normal.
lspci finds 2 Nvidia controllers for the 1080ti.
(1 VGA compatible at 8x and 1 HDMI Audio at 8x)
For all I know, this is to be expected.
( sleepy + ignorance ~ mybad )

Since, I haven’t yet tunneled out to my laptop. I’m running the fastai notebook in gnome on the GPU’s machine. nvidia-smi shows XServer to be using the GPU (even though the only display is on the integrated graphics port). So that might be affecting things.

Does the pytorch image that you use happen to have the cuda dev tools installed?
The examples have some tests in them.
I’ll probably try loading a cuda dev docker image and run some performance tests.
But first I plan to setup an ssh tunnel removing Nvidia X Server from the context.

@tthrift Hmm, I’m pretty sure, that the XServer should not be a big slow down. When I view the GPU usage with watch nvidia-smi I’m seeing around 50 %.

Nvidia recently released Ubuntu 18.04 with CUDA 9.2 images on dockerhub, anyone tried upgrading their images to it? Is it worth it, any fast.ai code that breaks with upgrade?

@xev In this image I’m using the images from their NGC registry. They have a monthly release cycle and in the May release uses 16.04. Maybe they switch to 18.04 in the June release.

Well I meant these images - https://hub.docker.com/r/nvidia/cuda/ - they already released 9.2-cudnn7-*-ubuntu18.04. Worth the trouble upgrading over ubuntu 16.04 image?

@xev Ahh, sorry I thought I’m in my own Dockerfile thread. Personally I don’t think it’s giving you any advantage running the same CUDA/CuDNN stack on 16.04 or 18.04.

@Matthew, I’m a huge fan of your Docker container and have used it on several installs. It has made getting up and running with Fast.ai much easier than any other procedure for environment setup I’ve come across!

I just completed a fresh install, and for the first time, had to make a slight deviation from your setup instructions. The last item under “Assumptions” is to install nvidia-docker.

Today, when I ran the curl command, I noticed a strange HTML-dump output. The subsequent steps of the Ubuntu installation instructions did not work. For just the nvidia-docker install, I swapped out the instructions on the official repo with those on this page, and it seems to have done the trick.

The creator of the tutorial goes through installing docker, before nvidia-docker. For those following your setup instructions, they would have already installed docker. They can scroll half-way down the page an pick up after

sudo docker run hello-world

I’m struggling with the login. I ssh into the server and have been able to setup everything, Jupyter is starting up but I can’t figure out where I would find the token or password. Normally it’s displayed on the command line and thought it would be the output that shows up below.

What am I missing? Is there a default password or is the token displayed elsewhere.
Screenshot below.

Did you try jupyter notebook list

Thanks, the problem was my inexperience with docker. I didn’t understand to get a shell to use docker exec -it <containername> /bin/bash and tried using ‘docker login’

Once I did, I was able to run jupyter notebook list get the token and log in.

Thanks for this Dockerflle. I used it with the NGC Pytorch container. I did have to modify the line where it sets the IP address of the notebook:

echo "c.NotebookApp.ip = '0.0.0.0'" >> ~/.jupyter/jupyter_notebook_config.py && \

I tried rebuilding everything from scratch and ran into the following error. I cannot figure out why it won’t install and everything else seems to work fine in jupyter until if I comment out that line. Also tried moving it around.

The command '/bin/sh -c /opt/conda/bin/conda install --name fastai -c fastai torchvision-nightly' returned a non-zero code: 1

Any ideas on how to fix?

Screenshot below:

I’ve had some issues with the paperspace version so built my own. Maybe it’s of use to someone. It now supports both GPU and CPU only hosts: https://github.com/morishuz/fastai-docker

1 Like