Docker support for V1

Hi guys, we are working to build out a new docker container for V1 with PyTorch 1.0 here at Paperspace and would love some help/testing. For some reference we have been maintaining a docker image for the current fastai course here.

The new V1 dockerfile is here: https://github.com/Paperspace/fastai-docker/blob/fastai/pytorch1.0/Dockerfile and here’s the built version on DockerHub under the tag paperspace/fastai:1.0-CUDA9.2.

In any case, I can confirm that Pytorch 1.0.0.dev20181017 is installed when pulling this image.

What would be a good smoke test to make sure that this is working correctly with all the latest libs? Do you see anything missing from the Dockerfile? Hopefully we can get a clean working container that can be generally used by people in this new course. Any feedback is much appreciated!

Dillon

5 Likes

You can run the tests and the examples notebooks in the fastai repo. For a more complete test run this in the fastai_docs repo

Got it! It’s working great :raised_hands:

For anyone else looking at this thread, CUDA 9.2 and 10.0 work great as base images but CUDA 9.0 does not. Also it seems to require a minimum nvidia driver version of 396.26 or it will hang (even if the container itself successfully builds)

I am currently using fastai in paperpspace, with a Linux 16 and installed all the dependecies with conda.
I had the problem that installing fastai in editable mode (pip -e) reinstalled a lot of dependencies already installed with conda (i was using pip installed by conda).
Anyway, fastai works, but a docker image would be much appreciated, to be able to change machines easily.
I also had de spacy bug.
I also tried Nvidia 4.10 and it also works great.

We’ve had nothing but headaches with docker, so we won’t be providing images. Anyone else is welcome to do so, of course!

@tcapelle we have been maintaining a docker container here. It has worked well and if you find anything that would make sense to change you can definitely submit a PR.

What about docker gave you headaches specifically? Was it building with a Dockerfile, getting things to run or performance issues?

I’m not Jeremy but one headache is that the above link is no longer active…

1 Like

This is what I’m currently using. Works for me! You have to register for the nvidia GPU cloud (free) in order to get the pytorch-batteries-included image. Remove the last two rows if you don’t like the dark theme, I’m a :bat: when it comes to themes.

And of course you have to customize the volumes in the docker-compose file for whatever you want to have in your container. The SCRATCH Folder you can see in the volumes is a folder, that I use as a layer below git to keep all my code synched between the different machines I use. I’ve heard several people questioning my sanity for that, but I actually never had any corruption for several years now.

dockerfile:

FROM nvcr.io/nvidia/pytorch:19.02-py3    

LABEL maintainer="Kai Lichtenberg"    

WORKDIR /    

RUN git clone https://github.com/fastai/fastai &&\
    cd fastai &&\
    tools/run-after-git-clone &&\
    pip install -e ".[dev]" &&\
    mkdir course-v3 &&\
    jupyter notebook --generate-config    

RUN conda uninstall -y --force pillow pil jpeg libtiff &&\
    pip uninstall -y pillow pil jpeg libtiff &&\
    conda install -y -c conda-forge libjpeg-turbo &&\
    CFLAGS="${CFLAGS} -mavx2" &&\
    pip install --upgrade --no-cache-dir --force-reinstall --no-binary :all: --compile pillow-simd &&\
    conda install -y -c zegami libtiff-libjpeg-turbo &&\
    conda install -y jpeg libtiff    

RUN jupyter contrib nbextension install --user &&\
    pip install jupyterthemes &&\
    jt -t chesterish

docker-compose:

version: '2.3'
services:
  fastai:
    container_name: fastai
    runtime: nvidia
    entrypoint: ["jupyter", "notebook", "--ip=0.0.0.0", "--no-browser", "--allow-root"]
    image: fastai:latest
    shm_size: '16gb'
    ports: ['8888:8888']
    restart: always
    volumes: ["~/Dropbox/SCRATCH/fastai:/fastai",
              "~/Dropbox/SCRATCH/fastai_course-v3:/course-v3",
              "/data:/data",
              "~/Dropbox/SCRATCH:/SCRATCH",
              "~/Dropbox/SCRATCH/jupyter_notebook_config.py:/root/.jupyter/jupyter_notebook_config.py"]

Is it still working as of today ?
For some reason the line conda install -y -c zegami libtiff-libjpeg-turbo returns an error for me, something with conflicting versions.

Note: I moved this to my github with more explanation, here:
[https://github.com/free-soellingeraj/analyze]
As of today, this one is working

FROM nvcr.io/nvidia/pytorch:19.11-py3

MAINTAINER Aaron Soellinger <ajs.consult.llc@gmail.com>

WORKDIR /

RUN apt update -y && apt upgrade -y \
        && apt install nodejs npm -y

RUN git clone https://github.com/fastai/fastai &&\
    cd fastai &&\
    tools/run-after-git-clone &&\
    pip install -e ".[dev]" &&\
    jupyter notebook --generate-config

RUN pip uninstall -y pillow pil jpeg libtiff &&\
    conda install -y -c conda-forge libjpeg-turbo &&\
    CFLAGS="${CFLAGS} -mavx2" &&\
    pip install --upgrade --no-cache-dir --force-reinstall --no-binary :all: --compile pillow-simd &&\
    conda install -y -c zegami libtiff-libjpeg-turbo &&\
    conda install -y jpeg libtiff

RUN jupyter contrib nbextension install --user &&\
    pip install jupyterthemes &&\
    jt -t chesterish

RUN pip install kaggle &&\
        pip install bcolz &&\
        pip install seaborn &&\
        pip install graphviz &&\
        pip install sklearn_pandas &&\
        pip install isoweek &&\
        pip install pandas_summary

With the docker-compose.yml

version: "2.4"
services: 
  fastai:
    container_name: fastai-image 
    entrypoint: 
      - jupyter
      - lab
      - --ip=0.0.0.0
      - --no-browser
      - --port=8900
      - --notebook-dir=/
    image: fastai-image:latest
    ports: ["8900:8900"]
    runtime: nvidia
    restart: always
    volumes:
      - "~/code/analyze/data:/ws/data"
      - "~/code/analyze/nbs:/ws/nbs"
      - "~/code/analyze/lib:/ws/lib"
      - "~/code/analyze/jupyter_notebook_config.py:/root/.jupyter/"
      - "~/.kaggle:/root/.kaggle/"

Notes:

  • It’s not possible to use the latest docker-compose format (right now 3.7) because of this: https://github.com/NVIDIA/nvidia-docker/issues/935.
  • The usage of this configuration is such that all the files that you will edit and want to save will be created under one of the /ws folders. The design is that nbs will contain the notebooks (potentially copied from fastai course cloned), lib will be code that I write to reuse in my nbs, and data will contain data files that I want to cache or otherwise persist across sessions.
  • The files are meant to be created and written in side the jupyter lab but they are persisted by the fastai docker service we’re creating. To commit/push them to github is a separate manual activity done on the host machine.

For me, my root project directory is ~/code/analyze. To activate the docker environment, I do:

cd ~/code/analyze
sudo nvidia-docker build . -t fastai-image
sudo docker-compose up

# Then in a browser, you will be able to see: `localhost:8900`
# I have utilized a LocalForward argument in my .ssh/config file to see this on my laptop
2 Likes