Pytorch Direct ML for non-Nvidia GPUs on Windows

xjdeng · October 25, 2021, 2:21pm

Found this in my social media feed today but haven’t tried it yet:

Theoretically, you’ll be able to use the GPU on Windows with DirectX12 by selecting this backend. This may open up more ways to use Fastai on machines without Nvidia GPUs though Mac users are still SOL unless they dual boot windows.

esistgut · November 4, 2021, 2:39am

Hello, I’m new here, this is the only thread discussing fastai on pytorch-directml on the internet.
Did you actually give it a try? I’m just starting the fastai course and I would like to test it locally on my Radeon 5700 XT GPU. This could be the only way to do it but I don’t know the library enough yet to test it myself.

Girard5416 · November 25, 2021, 11:40am

Thanks for sharing this info this is useful keep it up.

xjdeng · November 22, 2022, 1:01am

Update: I’ve been able to install DirectML and Fastai together. Pytorch seems to use DirectML alright but Fastai seems to fall back to the CPU even when I specifically tell it to use DirectML.

Here’s how to reproduce on a Windows machine. Yes, I know I’m using pip but DirectML doesn’t seem to like Conda for some reason. Yes, you’ll need to eventually uninstall the default pytorch and replace it with pytorch-directml.

conda create -n directml-test python=3.8 ipython

conda activate directml-test

pip install fastai torchvision==0.9.0

pip uninstall torch

pip install pytorch-directml

Now open up Python and run the following:

import torch

tensor1 = torch.tensor([1]).to("dml")
tensor2 = torch.tensor([2]).to("dml")
dml_algebra = tensor1 + tensor2
dml_algebra.item()

You shouldn’t get an error if your DirectML is set up.

Now run the following:

from fastai.vision.all import *

def catsanddogs(mydevice = "dml"):
    path = untar_data(URLs.PETS)
    files = get_image_files(path/"images")
    def label_func(f): return f[0].isupper()
    defaults.device = torch.device(mydevice)
    dls = ImageDataLoaders.from_name_func(path, files, label_func, item_tfms=Resize(224), num_workers=0)
    learn = cnn_learner(dls, resnet34, metrics=error_rate)
    learn.fine_tune(1)

Invoke the following and open your Windows Task Manager, monitoring your CPU and Memory usage patterns:

catsanddogs('cpu')

Afterwards, run the following to do the exact same as above but on the DirectML GPU:

catsanddogs('gpu')

At least for me, I see the exact same pattern of CPU and Memory usage as well as training speed. It’s as if it’s using the CPU instead of the DirectML GPU. I’ve confirmed the exact same behavior on two different computers! I’m not sure if your experience is the same; maybe I’m not using the right commands or what not?