I have created a docker using papersource dockerfile to run my fast-ai notebook on my host machine, which had titan gpu installed in it. The docker starts the notebook as a default commands. Here is command I used to run the container.
docker run -it -p 8899:8888 --device /dev/nvidia0:/dev/nvidia0 --device /dev/nvidiactl:/dev/nvidiactl --device /dev/nvidia-uvm:/dev/nvidia-uvm --shm-size 16G my-registry-host:5000/my-image:latest
My notebook seems to not use the gpu for model creation and it is running the fit-cycle super slow. I am assuming it is using CPU. Also, when I run nvidia-smi command, I don’t see my process consuming memory.
Could someone please tell me what should I do so that it uses the gpu and not the cpu?
I’m not sure if this is still an open question for you, but in case it is, here are a few questions that might help you debug the situation.
- Are the nvidia drivers and CUDA installed on the host?
Try running some of the examples included with CUDA on the host (not within docker) and run
nvidia-smi to see if the gpu is being utilized.
- Are you using the nvidia docker runtime?
From your docker command, it looks like you are setting device mappings manually. With the nvidia runtime, I don’t think those mappings are necessary. Just include
- Have you installed the GPU versions of the deep learning libraries?
I see you have a custom docker image “my-image:latest”. Can you make sure you have installed the gpu versions of pytorch (and tensorflow if using). Seems unlikely, but…
- Check gpu availability from Python.
For pytorch, follow the steps in this answer: https://stackoverflow.com/questions/48152674/how-to-check-if-pytorch-is-using-the-gpu
If this still doesn’t work, more details on your setup would be helpful (including your Dockerfile). Any any debugging steps you have already tried could also give some clues.