Platform: Windows 10 using WSL2 w/GPU

FourMoBro · June 18, 2020, 10:24pm

Finally, you can use the same computer to run Windows 10, while coding within a linux environment AND utilizing CUDA operations for deep learning. I myself prefer Win10 as my daily driver. But when it comes time to code, I am usually remoting to a cloud environment, or setting up a virtual machine, or running a separate computer without a monitor (headless) because it has beefier hardware. As of yesterday (6/17/20), NVIDIA has detailed steps to enable GPU processing within the Windows Subsystem for Linux (WSL) version 2. This could very well be the “dream” setup if have the hardware to run a GPU, and a bit of patience to get things installed. With Version4 of fastai course soon to be released, maybe now is the time to get your local machine configured in this fashion.

Caution:
With this being brand new, it is not for everyone. It can take several hours to get configured from a clean installation of Win10 into an fully operational environment. You need to install “beta” versions of software, some of which have expiration dates. If you don’t turn off updates, your system may change and things break. You need to install special drivers for the video cards. You need to register to even get access to the downloads. Some of these downloads may even be restricted for export, I do not know. But if this is something you want to tackle, read on.

Installation steps:
A detailed article on CUDA and WSL2 can be found here:

The more general steps that need to be taken can be found here:
https://docs.nvidia.com/cuda/wsl-user-guide/index.html
Here is what I did the first and/or second time trying to get this installed:
Setup Windows – Base Version

Download Win10 ISO from Microsoft. https://www.microsoft.com/en-us/software-download/windows10 I wanted the ISO file to setup another computer, which was my Ubuntu machine. The file is 3.85GB
I used Rufus to “burn” the image to my USB drive. I normally use balenaEtcher but they recommended Rufus.
Then I installed Win10 onto the other machine, overwriting my Ubuntu install. The install loads a non-Preview version of windows, version 19041.264. WSL2 w/GPU needs 20145 or higher.
Join the Microsoft Windows Insider Program, within the windows update screen. Join the “Fast” Track and check for updates. You may have to do several installs and reboots until you install the 20H2 Feature Update. This took my Win10 version to 20150.1000. You can check you windows version by typing “winver” in the run command area.

Setup Windows – NVIDIA driver
Do not proceed until you have updated your windows version to 20145 or higher. Also, this is not the same general availability driver. It is special, and must be done from the link shown.

Download the appropriate video driver for your system from https://developer.nvidia.com/cuda/wsl . This will require you to register with the NVIDIA development program. The file is 554MB in size. You are downloading for windows, and NOT linux. In fact, they say DO NOT install a video driver once you get into your linux distribution.
Install the video driver. I just used the Express option and left everything default. After a reboot or two, we can setup wsl2.

Setup Windows – WSL2
These steps follow https://docs.microsoft.com/en-us/windows/wsl/install-win10

Open a PowerShell window as an Administrator
Run dism.exe /online /enable-feature /featurename:Microsoft-Windows-Subsystem-Linux /all /norestart
Run dism.exe /online /enable-feature /featurename:VirtualMachinePlatform /all /norestart
Reboot the machine
Open a PowerShell window as an Administrator
Run wsl --set-default-version 2
Run wsl cat /proc/version . You will probably see that the kernel version is 4.19.104 or so. We need kernel 4.19.121+.

Setup Windows – WSL2

Go back to Windows Update, and look under “Advanced Options”.
Make sure the “Receive updates for other…” is set to “On”
Check for updates.
After a few checks, the Windows Subsystem for Linux Update 4.19.121 should appear.
Install
Reboot.

Setup Windows – Linux Disto

Open the Microsoft Store
Search for your preferred Linux distro. First time out, I chose Ubuntu 20.04, but ended up with problems. I’m not saying it won’t work, but the second time through I chose Ubuntu 18.04.
Get
Install
Launch
Set credentials

Setup Ubuntu within WSL

Open Powershell
Run wsl and this should start Ubuntu.
You now have a choice as to how you want to run your CUDA applications. Are you going to run docker and containers? Or are you going to run the CUDA apps natively? That is where Steps 4,5,6 of the nvidia walkthu tries to help. Let’s start with docker.

Setup Ubuntu – Docker

Within Ubuntu, run curl https://get.docker.com | sh . Now when you do this, it will tell you that it detects WSL and that you should run Docker Desktop for Windows. Well, I tried that, but never got it to work. I don’t know if it was docker’s fault, ubuntu 20.04’s fault or what, but it did not go. If you are adventurous and can get it to work, please let me know as I would prefer to use it.
Run a few more commands…
a. distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
b. curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add –
c. curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
d. curl -s -L https://nvidia.github.io/libnvidia-container/experimental/$distribution/libnvidia-container-experimental.list | sudo tee /etc/apt/sources.list.d/libnvidia-container-experimental.list
e. sudo apt-get update
f. sudo apt-get install -y nvidia-docker2
In a new terminal….
a. sudo service docker stop
b. sudo service docker start
Verify the docker service and GPU compute by running one of their examples.
a. docker run --gpus all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark
b. docker run -it --gpus all -p 8888:8888 tensorflow/tensorflow:latest-gpu-py3-jupyter

This should give you confidence that the general pieces are installed to use docker and cuda, but….

Setup Ubuntu – CUDA
…you may not have access to a CUDA container, or don’t want to use docker. In that case, it appears that you need to install cuda at the Ubuntu level. When I tired first time out, I could not get fastai to use the GPU even though it was working fine in docker. When running Ubuntu natively, we don’t have to install CUDA, as pytorch brings over what it needs. But this is no ordinary install. There needs to be communication to the video driver and that can only happen with installing CUDA (I assume).

Navigate to https://developer.nvidia.com/cuda-downloads
Select Linux
X86_64
Ubuntu
18.04
Deb local and follow the steps. I would list out the steps but am unsure if that is allowed as CUDA is also part of the NVIDIA developer program.
It should be noted that even with CUDA installed, running nvidia-smi within Ubuntu will not yield any results. It says it cannot communicate with the driver. However, during training, the command can be run within PowerShell to get an instant look at the memory usage, but the “watch” command will not work.

Setup Ubuntu – fastai2

You can now follow the Ubuntu setup from https://forums.fast.ai/t/platform-local-server-ubuntu/65851/26
Follow step 2, 6, 7, 9* and some of the other posts to get your installation working.
You can verify if GPU is being used by running torch.cuda.is_available() and hope that it returns True.

Setup Windows – Optional Steps

I would turn off/pause updates once this is working properly to prevent something breaking
Now would be a good time to install vscode if that is your preferred editor
I also like using Windows Terminal from the Microsoft Store. It gives a tabbed interface which can have WSL in one tab and PS in another or command prompt in another, etc

sgugger · June 18, 2020, 11:34pm

I’ve been very frustated with this. I have the proper versions of Windows 10 and the WSL2 kernel, re-installed the NVIDIA driver twice, but I can’t get the GPU to be recognized. For instance, after the process (and without even installing CUDA), I get torch.cuda.is_available() to return True, but when I try to put a tensor on the GPU, the kernel freezes (not even crashes, just hangs out forever).

For the TensorFlow docker (and using the WSL docker, as you said the desktop docker is useless), I get an error when testing if cuda is available

SlowLlama · June 21, 2020, 12:53am

I followed the instructions with a minor change and have the GPU working under WSL2 with Ubuntu 20.04 on two separate computers.

Note: wsl --set-default-version 2 will display a warning if the correct kernel is not installed. I followed the instructions as stated and checked for updates again. The kernel updated and wsl --set-default-version 2 no longer displayed a warning

wsl cat /proc/version only works if a Linux distribution is installed.

When the prompt and warning message comes up for Ubuntu 20.04 and Docker, I waited 20 seconds and let it install rather than hitting Ctrl-C. It installed successfully.

Output from docker run --gpus all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark on my laptop

Windowed mode
Simulation data stored in video memory
Single precision floating point simulation
1 Devices used for simulation MapSMtoCores for SM 7.5 is undefined. Default to use 64 Cores/SM GPU Device 0: "GeForce RTX 2060" with compute capability 7.5

Compute 7.5 CUDA device: [GeForce RTX 2060] 30720 bodies, total time for 10 iterations: 70.345 ms = 134.156 billion interactions per second = 2683.126 single-precision GFLOP/s at 20 flops per interaction

knowever · June 21, 2020, 4:59pm

Has anyone tried building an image for fastai course-v3 in docker and run it on WSL2? I wondering if the image I have build would work the same in WSL2. On a Ubuntu host, I had issue with Pytorch and swift inside docker, workarounds :

--shm-size 48G “PyTorch uses shared memory to share data between processes, so if torch multiprocessing is used (e.g. for multithreaded data loaders) the default shared memory segment size that container runs with is not enough” https://github.com/pytorch/pytorch#docker-image
--cap-add SYS_PTRACE “adjusts the privileges with which this container is run, which is required for the Swift REPL.” https://github.com/google/swift-jupyter/blob/master/README.md

artuskg · June 22, 2020, 12:03pm

Hmmm. No success yet. After joining fast ring, my winver got bumped to 20150.1000. Then I installed the preview WDDM 2.9 NVIDIA driver and afterwards wsl2 with Ubuntu 18.04. Kernel was already 4.19.121-microsoft-WSL2-standard on a fresh install. However, I am still stuck:

docker run --gpus all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark

yields:

> 1 Devices used for simulation
GPU Device 0: "GeForce GTX 1050" with compute capability 6.1
> Compute 6.1 CUDA device: [GeForce GTX 1050]
CUDA error at bodysystemcuda_impl.h:159 code=46(cudaErrorDevicesUnavailable) "cudaEventCreate(&m_deviceData[0].event)"

Googling indicated that this might imply a busy device but I am not sure how to get any further. Any ideas / hints?

bsalita · June 24, 2020, 1:38am

I’ve had mostly success with Ubuntu 20.04 on WSL2. I’m using an Intel 8700K, 1080ti, 32GB memory, 1TB SSD. The display uses Intel graphics, not the GPU. The python and tensorflow docker containers are working. fastai2 works both in and out of docker containers.

The only issues I’ve seen is that GPU memory has garbage collection issues. I have to run course_v4/01_intro.ipynb with a batch size of 16 to get it to complete without running out of GPU memory.

Does anyone know of a community forum dedicated to WSL2 GPU and data science?

Thanks to @FourMoBro for his brilliant writeup.

bsalita · June 24, 2020, 1:52am

I suppose we could exchange our WSL2 environments using the wsl --export feature. That might duplicate the issue on the other’s system.

bsalita · July 2, 2020, 8:20am

Benchmarks of WSL2 CUDA performance. Remember, WSL2 CUDA is beta and might not reflect release performance.

https://www.phoronix.com/scan.php?page=news_item&px=WSL2-CUDA-Perf-Early-Look

FourMoBro · July 5, 2020, 4:04am

So this evening I updated Windows to version 20161.rs_prerelease.200627-1754 and also updated the NVIDIA driver from 455.38 to 455.41. My training times have dropped by over 50% in the few samples that I have run, and my times are closer to what bare metal Ubuntu was giving me. It is nice that this performance has improved in just under 3 weeks since release.

axavio · August 15, 2020, 2:41pm

I just can’t seem to get kernel 4.19.121 on my WSL2. It is stuck at 4.19.104 even after multiple Windows update checks. Are we just at the mercy of the MS update system or is there a way to force the kernel update?
Win 10 Pro 2004 Build 19041.450

FourMoBro · August 15, 2020, 4:09pm

If you are still on 19041.450, it is too old. As of today, I am on 20190.

axavio · August 15, 2020, 5:39pm

Thanks for the fast reply. I had been on a 20H2 Windows Insider with an acceptable build level (I don’t remember the exact build, but know I had checked against your instructions) and it wasn’t working there, so that’s why I had reverted. I will try again.

axavio · August 15, 2020, 6:47pm

On Windows Insider “dev channel” (the “beta channel” wasn’t enough) with Build 20190.1000 now. And it did finally update the WSL kernel. It gave me a green screen crash during the update, but all worked out OK after a couple more reboots.

Thanks for your instructions!

FourMoBro · August 15, 2020, 9:01pm

I have been getting the GSOD more often these past two builds. I hope there is some stability in the near future.

reyf · August 16, 2020, 8:10am

Is there any word on when WSL2 comes out of beta? I’ve been reading July then August for a while now and there seems to be no sign of it. Given the discussion here, I think I’d rather skip the beta for now and wait for something a bit more stable…

axavio · August 17, 2020, 3:07pm

The install is working for me now, following the instructions. It seems a bit slower that expected, so I will be debugging that soon.

RTX2070 Super, Python 3.8.5, pytorch 1.6.0

Using the benchmark noted by @bsalita above, I get:
WSL2 (Ubuntu 18.04 distro): 195 seconds
Pop (Ubuntu) 20.04 (native): 98 seconds

FourMoBro · August 17, 2020, 3:51pm

For reference, fastai2 0.0.17, torch 1.5.0, using a 1080Ti GPU I get:

when running fastai2/nbs/course/lesson1-pets.ipynb.

When trying a similar notebook, fastbook/01_intro.ipynb, I get
Capture2

So there is speed comparable to bare metal without needing to dual boot or the like.

s.s.o · August 17, 2020, 7:11pm

Normally on windows the num_workers has to be 0. Is this issue solved and working ?

FourMoBro · August 17, 2020, 7:46pm

This code is not really running on windows, it is running on Ubuntu using WSL2.

s.s.o · August 17, 2020, 11:09pm

So, with WSL2 I can run num_workers bigger than 0 ??? That’s what I wanted to learn. Did you test it?