Platform: NVIDIA Jetson XAVIER NX

How to run Pytorch 1.5 and Fast.ai V2.0 on the Jetson NVIDIA Xavier NX Board:
Version 1.0 – 5/25/20

It’s been a little over a year since I wrote the “How to run Pytorch 1.0 and Fast.ai V1.0 on an NVidia Jetson Nano Board” article. NVidia just (May 2020) announced the NVIDIA Jetson Xavier devkit featuring the NVidia Volta Architecture , 8GB of CUDA RAM and 384 NVidia CUDA cores and 48 Tensor cores. It has a bunch other features too. It sells for $399 (got mine on Amazon), which is four times the price of the Jetson Nano. But since it has 8GB of CUDA RAM, you can train with larger batch sizes. I’m never the first one on my block to buy anything, except in this case. I bought one to try it out.

Here’s how you can also make it run the latest and greatest (as of May 2020) version of pytorch and fast.ai (V2). This install is for Python3 only . This install is NOT recommended if you don’t have much Linux experience, don’t know how to use SSH, or have no idea how IP networking works or what an IP address is.

What You Need:

  1. A ($399) Jetson NVIDIA Xavier NX Development kit – These can be ordered from many places, I got mine from Amazon.
  2. A (~$15) Class 10 64GB or larger Micro SD Card . – Make sure it’s class 10 or higher speed wise.
  3. A USB Keyboard – Got a PC? Use that one.
  4. An HDMI or DisplayPort cable and monitor
  5. An Ethernet cable , a Wireless router or HUB on your network . The NX does have native wireless support.
  6. A PC that you can plug the Micro SD card into to flash it. If you only have USB ports, that’s fine. Spend the extra $10 and buy a USB to Micro SD card adapter .
  7. Software for your PC that can create an SSH terminal , and software that can transfer files using SSH . For Windows I recommend Tera Term (free) and WinSCP (free). Use google to find where you can download these if you don’t have them already.
  8. Download this Zip File (xavier_nx_setup.zip) to your PC which contains these instructions as a PDF and scripts I’ve written and remember where you put it. It contains these files:
    1_setup_fastai_apt.sh
    2_setup_fastai_pip.sh
    3_setup_fastai_fastai.sh
    4_setup_jupyter.sh
    5_setup_course_v4.sh
    6_xavier_headless.sh
    0_venv.txt
    jupyter_notebook_config.py
    xavier_nx_setup.pdf

What to do first:

After you shiny new box arrives go to the NVidia developer website and follow these instructions to get started. Be sure you do all of the following:

  1. Download the NVIDIA Ubuntu 18.04 Zipped Image for the NX.
  2. Flash it to the SD card using their instructions. I use balenaEtcher software to do the flashing.
  3. Put the SD card into the NX, plug in the USB keyboard, monitor and Ethernet cable attached to the router (to complete this process you must have Internet access).
  4. Boot the machine, accept their license, etc.
  5. Pick a machine name that works on you network, pick a user name and password you can remember, you’ll need to know them!

Once it boots up and you’ve verified it’s on your network and the Internet:

  1. The NX flash card has UBUNTU 18.04 LTS and their version of the Unity desktop. On the top right is a mode dropdown. It defaults to the mode using only 10W with 2 CPU cores active . I set mine to use Mode_15W_6 Cores.
  2. Go to the Network Settings and find the IP V4 address of your machine, write it down , or if you understand IP networking set up a fixed IP address .
  3. Setup SSH Server on the NX : “sudo apt-get install openssh-server” You may have to do “sudo apt update” followed by “sudo apt upgrade” first.
  4. Use the PC terminal program to open an SSH session with your NX at the IP address (see step 2).
  5. Use your file transfer program to transfer the files in xavier_setup_fastai.zip to your NX user’s ~/Downloads/xavier_setup_fastai directory.

From either the console or via an SSH connection, set execute permissions on the scripts you’ve just downloaded:

cd ~/Downloads/xavier_setup_fastai
chmod +x *.sh

Use python venv to create a virtual python3 environment:

  1. These instructions are in the 0_venv.txt file.
  2. Go to your home directory: cd ~
  3. Create an environments directory: mkdir envs
  4. Go the new envs directory: cd envs
  5. Create a “virtual environment for fastai2:
    python3 -m venv fastai2
  6. Activate the fastai2 environment:
    source ~/envs/fastai2/bin/activate

Install pytorch and fast.ai:

If at this point you want to try the standard fast.ai and pytorch install, go right ahead, it will fail. For a bunch of reasons I’m not going to go into now, the standard pip commands simply won’t work for this.

If you just run the scripts you downloaded in-order you should be up and running by tomorrow. Now this will take several hours at best , so don’t hold your breath. Each script has a number; ( 1 _setup_fastai_apt.sh), you must run ALL of them IN-ORDER. You can try combining them all into one big script if you want, but in-case of errors it’s better to just run them one at a time. My advice is to run the first one, check back in a couple of hours, run the next one if it worked and then call it a night . All of the scripts require sudo , so they may stop and ask you for a password . After the 2_setup_fastai_pip.sh script finishes the rest will go more quickly.

  1. ./1_setup_fastai_apt.sh - takes a couple of hours.
  2. ./2_setup_fastai_pip.sh – takes a Loooong time. Run overnight
  3. ./3_setup_fastai_fastai.sh
  4. Logout and reboot ( very-important )
  5. Login again
  6. Activate your VENV: source ~/envs/fastai2/bin/activate
  7. Go back to the directory where the scripts from xavier_nx_setup.zip are installed

Install Jupyter notebook:

After fast.ai is installed, it tells you:

Done with part3 - fastai is now setup, you must logout and login again before doing part4

This is because the Jupyter install doesn’t export the shell variables it needs to run. So shutdown all your terminals, SSH sessions etc. and just reboot the NX from the GUI. Once it comes back up. Open up a terminal from SSH or the GUI run ./4_setup_jupyter.sh

./4_setup_jupyter.sh

This also takes a while, so again; don’t hold your breath. The last step of this script asks for your Jupyter password. This IS NOT your login password, this is a separate password you can use to log into Jupyter notebook from any PC on your network, so pick an appropriate password and write it down . The default Jupyter notebook install only lets you log in from the console or GUI, the modified jupyter_notebook_config.py file you downloaded and installed with the script allows you to login from any machine on your network. To run Jupyter notebook you will have to open a terminal or SSH then activate your fastai2 environment and then run Jupyter notebook.

source ~/envs/fastai2/bin/activate
cd ~/fastai2
jupyter notebook

If it doesn’t run, it’s probably because you didn’t log out and in again.
That’s it. You’re done; you can now run pytorch and fast.ai.

If you want to install Version 4 of Part 1 or the Fast.AI Course:

cd ~/fastai2
git clone https://github.com/fastai/course-v4.git

or
run 5_setup_course_v4.sh from the xavier_setup_fastai directory.

A Note about VENV

Whenever you log off, you have to get back to your virtual environment, you can do this by:

source ~/envs/fastai2/bin/activate

Now you’re in a virtual environment, you can’t just say “python …” to do something, or “pip …” to install something, you have to say “python3 or pip3”, because that specified which version of python we’re using. But if you’re lazy and forgetful; like me, you can add the** alias command below to your .bashrc file or just type it every time you do a VENV source command:**

alias python=python3

Memory isn’t everything, but it’s definitely something:

Back in the old days (of say 2010), 8GB was a lot of memory. Today if you’re not using the GPU or not training this is enough to get your notebooks running well (the NX version of UBUNTU 18.04 also has 4 GB of virtual swap file helps quite a bit). But if you’re using CUDA, it doesn’t use swap space, so you need each and every byte of that 8GB.

To get that , it’s time to jettison the GUI and run via a remote console using SSH. Running the jetson_headless.sh script will uninstall the GUI, and purge a couple of unnecessary packages that take up over 300MB of RAM. So after you run this and reboot, you’ll only have console access to the NX , but the machine will start using only about 564MB of RAM, leaving you with 7.6GB for pytorch and fast.ai.

  1. run: ./6_xavier_headless.sh
  2. reboot and SSH into your NX.

NVIDIA Utilities you need to install and know about:

  1. There is a great package called jetson_stats that contains an equivalent utility to nvidia_smi (which doesn’t work on the NX) called jtop. There is also a jetson_config utilty that can purge the desktop for you as well as do other neat things.

  2. The NX built-in command line utility; nvpmodel (must be sudo to run) which is very useful. It lets you set the board modes (number of CPUs, Watts used, fan speed, etc.) I set my processor mode to 6CPUs using 15W
    sudo nvpmodel -m MODE_15W_6CORE
    and set my fan to max with:
    sudo nvpmodel – d cool
    See Their development Guide for complete information on this utility.

Something didn’t work, Trouble-shooting:

I’ve spent 4 solid days and tried my best to debug these instructions and scripts, and each time if something fails, I correct the instructions or scripts and continue. Sometimes for whatever reason one of the pip installs fails, so you have to run it again. If a library is not found once Jupyter notebook is up and running, look for the correct pip3 command in the 2_setup_fastai_pip.sh file.

So the best I can say is “It works for me.” If it doesn’t work for you, try to figure it out using google, because that’s what I did. I didn’t create any of these libraries or tools, so if something fails I probably can’t help you. So don’t be mad if I don’t reply to trouble-shooting queries. If something doesn’t work AND you fix it, tell me how you did it so I can amend these instructions and scripts.

I’ve also found that in the tabular examples, it hangs even with small batch sizes; I think this is because the ZRAM compression (swap file) runs out of memory while loading batches, if someone finds a work-around let me know. Maybe a fixed swap file will help.

A Note about changes:

As of May 2020, this hacky install method works and installs the latest versions of both pytorch 1.5 and fast.ai 2.0, but things change. In the future you will have to update one or more packages or fast.ai itself. Hopefully some clever soul will figure out how to do that and maybe even build a GIT repo . My work here is done.

13 Likes

Is the board fast enough to train Models? I am still trying to find a cheap way to improve my workflow. How is the Performance compared to a K80 for example?

1 Like

Don’t have a K80, and have never used AWS, Colab or Paperspace. I have other machines that have Linux 18.04 LTS with two GTX-1080s. Compared to them, the machine trains much more slowly, and I’ll be publishing some stats as I get more time in with this machine.

3 Likes

Thank you for your work.

I was already motivated to dive in on the Jetson Xavier NX and I believe you leave with not much more doubts.

Have you tried the Transfer Learning toolkit from Nvidia?
Peoplenet?

Thank you,

1 Like

Not yet, I’m working on getting Fast.ai V2 on the Jetson Nano right now. I’d like to get a comparison table for the Nano vs. the NX vs. the GTX 1080 using the Lesson 1 01_intro notebook. And I’m still working on the swap file issues created by ZRAM.

2 Likes

I have worked with the TLT for a few days. Some of my experiences:

  • They use KITTI format and I’ve found it cumbersome to get my dataset converted to this format.
  • From KITTI you then create TFRecords (but still need the KITTI images as well).
  • I ended up converting to KITTI from COCO JSON format and PASCAL VOC and then following their tutorial for DetectNet v2.
  • I only had about 100+ images for labeling which I augmented to something over 300 images first and then used the augmentation settings from the TLT configuration file for further augmentation.
  • Pruning didn’t make my network any smaller (or faster).
  • It’s a lot of effort IMO. fast.ai is much more fun in comparison.
  • I’m committing to my Docker image in between to save some files that are not in the mounted host directories. The image has some issue when starting the docker container consecutive times which I have addressed in this post on the Nvidia Dev forums.

My next steps will be converting it for Jetson and testing on the device for speed. I also want to check the YOLO they provide.

1 Like

Hi,
Were you able to run Fastai2 on jetson Nano?

I haven’t been on in quite a while (work gets in the way), so I’m really not sure. You can try the instructions for the Xavier NX as they’re almost identical to the Nano and see. I was able to get the earlier version of FastAI to work using instructions I posted earlier. Right now I don’t know exactly where those are, but if you search these forums you’ll find them.

Why do nearly all supercomputers run Linux? GNU/Linux is just one of many operating systems.