Within the install guide discussion, @Interogativ and I had been discussing how to get Swift for TensorFlow working on Nvidia’s Jetson devices, and I believe I finally have a fully GPU-enabled build operational. I wanted to pull this out into its own topic, in case anyone else was interested.
The Nvidia Jetson single-board computers are interesting for exploring inference at the edge, because they combine a relatively low-power ARM64 processor with CUDA-compatible mobile Nvidia GPUs in a small package. In particular, their new $99 Jetson Nano provides a Maxwell-based GPU supporting CUDA 10.0 and cuDNN 7.3.1 along with a quad-core CPU.
Traditionally, it had been difficult to get a Swift toolchain building correctly on ARM64, but Neil Jones’ repository here has instructions on how to make that work now. Their latest builds didn’t have TensorFlow support or CUDA enabled, but with a few slight changes I was able to get that building. Here are two toolchains I’ve built and temporarily hosted:
- ARM64 Swift for TensorFlow as of a 5/1/2019 snapshot (CPU only, 350 MB)
- ARM64 Swift for TensorFlow as of a 5/1/2019 snapshot (CUDA 10.0, cuDNN 7.3.1, 530 MB)
Both of these work on the Jetson devices I’ve tried them on (Jetson Nano, Jetson Xavier), but they do require the latest Jetpack (Nvidia’s OS / tools image). CUDA 10.0 and cuDNN 7.3.1 are pre-installed by Jetpack, so you can skip over those install steps in the guide. I also found that I needed to install the following packages:
sudo apt-get install python3-venv python3-dev libcurl4-openssl-dev libfreetype6-dev
to get the Swift Jupyter kernel to install correctly. I may be missing a package or two in there.
While the Jetson Nano has enough processing power and a CUDA-compatible GPU for doing training, it does have a problem with memory. It only has 4 GB of memory onboard, and shares that between CPU and GPU. On the Nano, once I’ve loaded up the Jupyter notebook server and Chromium browser, the system only has ~500 MB of available memory left. As a result, once I try to load a large CUDA tensor (such as is created when loading the MNIST dataset in one of the notebooks), the GPU runs out of available memory and allocation fails. This shouldn’t be as much of a problem on the more powerful Jetson devices, like the TX2 with its 8 GB of memory or the Xavier with 16.
The Jetson Nano wasn’t going to be the optimal training computer, but for $99 for a full computer capable of running accelerated Swift for TensorFlow it could be a good entry-level platform for experimentation. It’s certainly useful for edge inference, and it should be easy to transfer Swift for TensorFlow code and models developed elsewhere to these single-board computers. The TX2 and Xavier provide a lot more processing power for robotics and other applications.
(I posted my build process over in the Swift for TensorFlow mailing list, for reference.)