I am experimenting with running all my deep learning stacks inside Docker containers. I need CUDA and other libraries for Tensorflow, PyTorch and a Stereo Camera and all three have strict restrictions on CUDA versions. In addition, Docker will allow me to run all three and let me transfer the same development environment on AWS or another system.
I am using Fedora 26 on a Lenovo Legion Y520 that comes with a Nvidia 1050Ti GPU having 4GB memory. This approach only requires installing Nvida drivers no CUDA installation is needed, as the nvidia runtime contains CUDA 8.
For anyone interested, here are the steps I followed:
I disabled the default Nouveau drivers and installed latest Nvidia drivers (v.384.90) by following the this guide.
Installed Docker Community Edition .
Installed Nvidia Docker v2.0. It supplies CUDA libraries to any docker image that runs with
--runtime=nvidia (see step 7 for more details).
Signed up for Nvidia GPU Cloud (NGC) account. Signing up is free and required for downloading the PyTorch image (or any of the many other images available from NGC).
Created an API key from my NGC home page.
docker login nvcr.io to login in to nvcr.io repository. The username is “$oauthtoken” without the quotes and the password in the key generated in step 5.
Pulled the PyTorch docker image from nvcr.io using
docker pull nvcr.io/nvidia/pytorch:17.10 .
Ran a docker container using the image using
docker run --runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=all -v $(pwd):/workspace --rm -it nvcr.io/nvidia/pytorch:17.10 (more details here.)
Confirmed GPU availability inside the docker by running
nvidia-smi and then tested CUDA capability by running the MNIST example in
Well, with this, you can go in the NVIDIA GPU Cloud, but obtain more powerful GPUs.
I’ve a question. Is it similar to what AWS or crestle are providing? Cloud GPUs?
Nvidia’s cloud GPU runs atop AWS. Personally I’m not a big fan of software built by traditional hardware companies. I’ve an AWS P2 instance and would use it or my laptop for the assignments. However I signed up for NGC just to get their prebuilt Docker image that I can run on any platform that has Nvida GPU drivers installed.
This was helpful. So, Nvidia is just providing docker/Ami. We still need to purchase GPU based system.
I’m trying to use docker but I’m a little bit confused. If I install a package via pip or conda it will only be available inside the container?
yes, if you install it inside docker container it will be accessible only there, and vice versa
Can you elaborate on how I do that? I tried installing the environment.yml file but it didn’t worked.
I probably misunderstood the question.
are you inside the container interactively or building Dockerfile?
I followed the above instructions, the last thing I ran was
docker run --runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=all -v $(pwd):/workspace --rm -it nvcr.io/nvidia/pytorch:17.10
So I’m inside the container now
okay, so you can install anything additionally needed as on regular machine (i don’t recall what exactly this image is missing for the fastai).
But to make changes persistent:
docker ps -a - to find out container id which you were working on
"docker commit /container id/ /your new imagename:tag/
Docker is a more advanced approach and there’s alway issues with Docker details and how it interacts with CUDA. So unless you’re prepared to spend some time debugging and are interested in learning about this in particular, I’d suggest avoiding Docker for deep learning.
Here is where I might need help. Where is that image stored? I would like to add the fast.ai library to it for when it is built. However, I can’t seem to find it on my computer.
There are two ways to do this that I know of.
A: Edit the Dockerfile that the image was built from (or an equivalent Dockerfile) and build a new image.
B: Run a container from the image, make your changes, and then commit the container to a new image.
Method B is faster, however I prefer method A because what’s going on in the image becomes transparent and easier to understand. If you’d like to try method A, I suggest reading some of @hamelsmu’s Docker tutorial and then checking out my Dockerfile for fast.ai and the accompanying README for reference as you write your own.
You could also use the paperspace docker container: https://hub.docker.com/r/paperspace/fastai/ .
I am not sure you have to set this ENV variable. I believe by default all the devices are visible in the container, at least that is what nvidia-smi is showing me.