Unable to install PyTorch GPU in AWS g4dn instance

ohthosegradients · July 10, 2021, 9:34am

Hello everyone,
I’m looking for some help in installing PyTorch that uses the Nvidia Tesla T4 GPU available in AWS’ g4dn EC2 instance. One of the reasons I love Conda is that it installs all the dependencies for the GPU by itself. So it boggles me why it is not doing this time.

Steps followed:

Use the instructions mentioned in this link(fast ai course) until mamba is installed in the base Conda environment.
Execute ubuntu-drivers devices and since nvidia-driver-460-server is recommended, that is installed.
Command nvidia-smi returns Driver Version: 460.73.01 and CUDA Version: 11.2 is installed. I cross-checked with the Nvidia website, that is the correct version.
Create a conda env, activate it, and execute mamba install --dry-run fastbook.
That command suggests that:
- pytorch 1.9.0 - py3.8_cpu_0 - fastchan/linux-64 - 73 MB
- torchvision 0.10.0 - py38_cpu - fastchan/linux-64 - 24 MB
- And no mention of cudatoolkit package

I have also tried the following but they return the same PyTorch version suggestion:

downgrade to nvidia-driver-450-server which installs cuda 11.0 and driver 450.119.04
mamba install cudatoolkit=11.2 explicitly into the Conda env

The default python version that installs is 3.8.5. Upgrading to 3.9 doesn’t work.
The CUDA version mentioned in install section of pytorch site is 11.1.

I would very much prefer not to go install the packages individually as mentioned in nvidia website and here(medium link).

So, what am I doing wrong? How do I make mamba/conda install to see the GPU?

I welcome suggestions to help me solve this issue, please.

Thank you.

ohthosegradients · July 10, 2021, 9:05pm

I solved it. I had to add conda-forge and nvidia channels.

After several tries this is the command I settled on:

mamba install pytorch torchvision cudatoolkit=11.2 fastbook python=3.9.5 -c nvidia -c conda-forge

anderslindstrom · November 29, 2023, 3:13am

Thank you. This is very helpful. I had the same problem but this fixed it.