Unable to install PyTorch GPU in AWS g4dn instance

Hello everyone,
I’m looking for some help in installing PyTorch that uses the Nvidia Tesla T4 GPU available in AWS’ g4dn EC2 instance. One of the reasons I love Conda is that it installs all the dependencies for the GPU by itself. So it boggles me why it is not doing this time.

Steps followed:

  1. Use the instructions mentioned in this link(fast ai course) until mamba is installed in the base Conda environment.
  2. Execute ubuntu-drivers devices and since nvidia-driver-460-server is recommended, that is installed.
  3. Command nvidia-smi returns Driver Version: 460.73.01 and CUDA Version: 11.2 is installed. I cross-checked with the Nvidia website, that is the correct version.
  4. Create a conda env, activate it, and execute mamba install --dry-run fastbook.
  5. That command suggests that:
    • pytorch 1.9.0 - py3.8_cpu_0 - fastchan/linux-64 - 73 MB
    • torchvision 0.10.0 - py38_cpu - fastchan/linux-64 - 24 MB
    • And no mention of cudatoolkit package

I have also tried the following but they return the same PyTorch version suggestion:

  • downgrade to nvidia-driver-450-server which installs cuda 11.0 and driver 450.119.04
  • mamba install cudatoolkit=11.2 explicitly into the Conda env

The default python version that installs is 3.8.5. Upgrading to 3.9 doesn’t work.
The CUDA version mentioned in install section of pytorch site is 11.1.

I would very much prefer not to go install the packages individually as mentioned in nvidia website and here(medium link).

So, what am I doing wrong? How do I make mamba/conda install to see the GPU?

I welcome suggestions to help me solve this issue, please.

Thank you.

I solved it. I had to add conda-forge and nvidia channels.

After several tries this is the command I settled on:

mamba install pytorch torchvision cudatoolkit=11.2 fastbook python=3.9.5 -c nvidia -c conda-forge
4 Likes

Thank you. This is very helpful. I had the same problem but this fixed it.