Just received my Intel Arc A770 GPU

cjmills · May 29, 2023, 11:04pm

Yeah, the cumulative performance hit of using WSL is one of the reasons why I stick to dual-booting on my desktop. That and the other headaches I’ve encountered.

daniel.sabzi · May 30, 2023, 10:40pm

@ cjmills
Awesome,

Thanks Christian.
Waiting for the tutorial.

Three_Squirrels · June 18, 2023, 7:48pm

Hi, thanks a loooooooooot for your work!!! I’m looking for this kind of comparison for several months.

cjmills · June 18, 2023, 11:37pm

I do not know if you updated your post because you found the tutorial, but here is a link, just in case.

Getting Started with Intel’s PyTorch Extension for Arc GPUs on Ubuntu

It includes a link to the jupyter notebook with the modified training code.

intel-arc-pytorch-timm-image-classifier-training.ipynb

Three_Squirrels · June 19, 2023, 7:09am

Thanks a lot! Great testing for this series. Hope ipex 2.0 for xpu will work better!

cjmills · August 30, 2023, 7:44pm

I’ve been testing the latest release of Intel’s PyTorch extension on native Ubuntu and Windows, and I wanted to share my initial findings here before writing the blog post.

Native Ubuntu

First, I tested performance with the image classification notebook I used previously. Training time on Ubuntu was within six seconds of version 1.13.120+xpu of the extension. The final validation accuracy was identical.

Next, I tested the inference speed for Stable Diffusion 2.1 with the Hugging Face Diffusers notebook I used in this post. Inference speed when using bloat16 is approximately 25% faster than with the previous version of Intel’s PyTorch extension.

Using float16 has the same inference speed, but the model produces NaN values. The torch.compile() method seems to expect CUDA to be enabled, and the compiled model throws an error when I try to use it.

Last, I tried to run the training notebook for my recent YOLOX object detection tutorial. This notebook was the only one that did not work as expected. First, I had to replace some view operations in the loss function with reshape operations to handle non-contiguous data.

The training code ran with those changes, but the loss decreased much more slowly than on Nvidia GPUs and never reached usable performance. I tested inference performance with model checkpoints trained on my Nvidia GPU and got identical inference predictions, so the issue does not appear to be with the model itself. The training code also achieved usable accuracy when using the CPU, so it might just be a bug with the extension.

Training time was about 11 minutes for a single pass through the training set on the Arc GPU. For reference, the same takes about 2 minutes on an RTX 4090 (my Titan RTX died a bit ago).

I have not attempted to compile the extension from the source code to see if that provides different results.

Native Windows

Getting the extension to work on native Windows was a bit of a hassle, but the process is not too bad now that I know the steps. Most of the frustration came from not knowing I needed to disable the iGPU in Windows for the extension to find the Arc GPU.

Fortunately, those initial frustrations were worth it, as the extension works quite well on native Windows.

The total training time for the image classification notebook was slower than native Ubuntu but faster than WSL. That’s about as well as I could expect, given PyTorch on native Windows tends to be slower than Ubuntu, and Python multiprocessing takes longer to start on Windows.

I needed to replace the same view operations with reshape operations in the loss function for the YOLOX training code on Windows. However, this time, the notebook produced a model that was comparably accurate to one trained on Nvidia GPUs. I have no idea why the Windows version of the extension works when the Ubuntu version does not.

Total training time was a bit slower than native Ubuntu but still much faster than the free tier of Google Colab.

The Stable Diffusion inference notebook, also to my surprise, was about 25% faster than Ubuntu.

I’ll see how much I can streamline the setup process for Windows before making a tutorial. The oneAPI toolkit takes up quite a bit of space.

I might also try compiling the Ubuntu version to see if that resolves the issues with the YOLOX training code.

cjmills · September 5, 2023, 6:15pm

Well, this is frustrating. I was about to wrap up my tutorial for setting up the extension on Windows and decided to test the installation steps by uninstalling everything and starting from scratch.

The installation process worked as expected, but now I get the same behavior for the YOLOX training code as in Ubuntu. Also, the Stable Diffusion inference notebook is about 1it/s slower than previously.

I’m now wondering if I had something installed before I originally installed the extension that caused the different behavior.

romrom · November 26, 2023, 4:54am

cjmills:

I finally set time aside to get the Arc A770 (16GB) working with PyTorch in Ubuntu 22.04. I’ll do more testing tomorrow, but here is a quick ResNet50 benchmark. There were some irritations, but the setup process is not too bad now that I’ve done it. I plan to make a blog post for it, starting from a clean Ubuntu 22.04 install. I’ll test it in WSL2 as well. I still need to see how to view GPU usage in Ubuntu.
import torch
import torchvision.models as models
import intel_extension_for_pytorch as ipex

Is that on a system with Intel or AMD CPU? @cjmills

cjmills · November 26, 2023, 5:13am

An Intel i7-11700K, specifically.

romrom · November 26, 2023, 5:18am

@cjmills Thanks! I’m trying to figure out if I can use intel_extension_for_pytorch with the A770 and an AMD CPU or if the Arc GPU and ipex need an Intel CPU. So far, I couldn’t find any answers.

romrom · November 26, 2023, 5:22am

How did the WSL2 test go? I saw that the Arc A-series discrete graphics family does not support GPU virtualization technology.
https://www.intel.com/content/www/us/en/support/articles/000093216/graphics/processor-graphics.html

Doesn’t that prevent the A770 from being available under WSL2 Ubuntu? @cjmills

cjmills · November 26, 2023, 5:36am

romrom · November 26, 2023, 5:43am

Wow, I didn’t expect such a huge performance gap. However, so virtualization actually works contrary to what Intel says? @cjmills

romrom · November 26, 2023, 5:49am

@cjmills Was your monitor connected to the UHD graphics in both cases?

cjmills · November 26, 2023, 5:49am

I have not explored it beyond WSL.

That performance gap is not unique to the ARC card BTW, it’s an issue with WSL that has been around since I first tested WSL for deep learning projects in 2020.

For the best performance, I recommend native Linux, then native Windows, then WSL as the last option.

cjmills · November 26, 2023, 5:50am

Misread that, it was connected to the ARC card in all testing

romrom · November 26, 2023, 5:53am

I’m late for the WSL game; the first time I tested it was last week. I think it’s super convenient. Too bad the performance suffers more than I was hoping for. I only have one machine right now, and dual boot isn’t really my preferred choice; I still rely on some Windows software.

cjmills · November 26, 2023, 5:54am

In that case:

romrom · November 26, 2023, 5:57am

Do you think it would make a difference to connect the output to the integrated UHD instead and have the Arc GPU fully available as a compute device only?

cjmills · November 26, 2023, 6:02am

Probably not for deep learning tasks as the card has dedicated hardware for tensor operations. Got to go now.