Fast-ai docker unable to use the gpu

ranjit · November 21, 2018, 10:17pm

HI All,
I have created a docker using papersource dockerfile to run my fast-ai notebook on my host machine, which had titan gpu installed in it. The docker starts the notebook as a default commands. Here is command I used to run the container.

docker run -it -p 8899:8888 --device /dev/nvidia0:/dev/nvidia0 --device /dev/nvidiactl:/dev/nvidiactl --device /dev/nvidia-uvm:/dev/nvidia-uvm --shm-size 16G my-registry-host:5000/my-image:latest

My notebook seems to not use the gpu for model creation and it is running the fit-cycle super slow. I am assuming it is using CPU. Also, when I run nvidia-smi command, I don’t see my process consuming memory.

Could someone please tell me what should I do so that it uses the gpu and not the cpu?

dennisobrien · December 11, 2018, 10:49pm

Hi Ranjit

I’m not sure if this is still an open question for you, but in case it is, here are a few questions that might help you debug the situation.

Are the nvidia drivers and CUDA installed on the host?
Try running some of the examples included with CUDA on the host (not within docker) and run nvidia-smi to see if the gpu is being utilized.
Are you using the nvidia docker runtime?
From your docker command, it looks like you are setting device mappings manually. With the nvidia runtime, I don’t think those mappings are necessary. Just include --runtime=nvidia.
Have you installed the GPU versions of the deep learning libraries?
I see you have a custom docker image “my-image:latest”. Can you make sure you have installed the gpu versions of pytorch (and tensorflow if using). Seems unlikely, but…
Check gpu availability from Python.
For pytorch, follow the steps in this answer: https://stackoverflow.com/questions/48152674/how-to-check-if-pytorch-is-using-the-gpu

If this still doesn’t work, more details on your setup would be helpful (including your Dockerfile). Any any debugging steps you have already tried could also give some clues.

cheers,
Dennis

willismar · December 12, 2018, 1:54am