Update #1
I installed the Arc card on my desktop with the latest 31.0.101.3490 driver. It feels way more stable than the preproduction card I got last year just moving around Windows.
I tested the card’s performance on the OpenVINO-Unity project from one of my tutorials. I noticed performance heavily depends on PCIe bandwidth.
I first tried it in the second slot with my Nvidia GPU in the first slot. On my motherboard, this configuration results in PCIe Gen3x4 for the second slot. Framerates in the Unity Editor topped out at around 120fps (fp16). Still better than CPU performance, but not what I would expect from this card.
I then put the Arc in the first slot with PCIe Gen4x16. In this configuration, the card easily maintained 160fps (fp16). Performance doubled in the CLI demo (fewer bottlenecks) that uses the same model.
For reference, using the DirectML execution provider with ONNX Runtime on my Titan RTX tops out at around 140fps (fp32) with the same model. The Arc card hovers around 120fps (fp32) in the DirectML project.
I did notice an odd quirk that might be related to the Xe Matrix Extensions or XMX (think of Tensor cores for Nvidia). The Arc card with the OpenVINO demo seems sensitive to the input resolution.
I use a default resolution of 398x224 (for a 16:9 aspect ratio), which translates to a 384x224 (divisible by 32) input resolution for the YOLOX model. At this resolution, the model detects the same hand gestures with the Arc card as the CPU. However, the confidence scores are much lower, and the bounding box dimensions are slightly different (but still usable).
Moving to an input resolution of 448x256 gets closer to the CPU confidence scores and bounding boxes. I then moved to an input resolution of 896 x 512, and the OpenVINO demo crashed with the Arc card (but not for the CPU or iGPU). It did not crash using an even higher resolution of 1120 x 640 (approximately 65fps for those curious).
None of these issues occurred with the Arc card in the DirectML project, which does not use XMX.
Next, I’ll set up a conda environment with pytorch-directml in wsl2 and see if I can train any models. As far as I know, this would currently be the only way to train models with an Arc card until the main libraries add support for Intel GPUs.