Fastai on Xeon Phi cluster: a guide (work in progress)

I just got access to a (remote) supercomputer full of Xeon Phi’s. No CUDA. It’s not recommended to get a Phi for Deep Learning if you’re shopping for new hardware (e.g. see Tim Dettmers’ blog), but if somebody offers you a huge machine for doing parameter surveys on, it’s worth a shot. The Phi is not a GPU, but Intel has been doing a lot of work (both technically and marketing-wise) to offer it as a legitimate architecture for fast Deep Learning applications.

I searched the forums for info on this situation and turned up nothing, so I’m starting a thread to inform others about it. If anybody has more experience with me than this, please reply/edit/correct this post appropriately.

Assumptions: You’re an ordinary user without superuser access. And on the Phi system, you’re running Linux with bash. (Also: I’m doing all all this from my Mac laptop via Terminal.)


A lot of big (supercomputer) systems work via “modules”, which may not be loaded when you come on the machine – for example, I couldn’t even find Python 3 anywhere at first.

Furthermore, these systems have expert support staff who build binaries “for a living.” So save yourself some headache and find out what’s there already.

To see what’s available, run “module avail”

$ module avail

…in my case, there’s definitely python3, and anaconda3, an old version of PyTorch (0.4), but nothing more recent. (There’s also a recent Tensorflow and Keras, which is nice for other projects, but not this one.)

So we’re going to have to install PyTorch and fastai.

I.A. Ask the staff
Sure, you don’t want to be annoying, but the staff are probably happy to help out rather than have you go off & do your own thing — even if it’s only to say “Have you read our documenation on…?” (Note: Often the docs at some places are poorly-indexed and hard to search for, or they only apply to some machines & not others, so don’t feel too bad if you couldn’t find the one document you tried to find but couldn’t.)
A simple direction from staff can save you a day of re-compiling, and they may even offer to build it for you. e.g.

“If you don’t see a software package installed in our environment that you would like to use, please let us know…”

That said, obviously, do read the docs for your system (e.g. about job submission, allocations, etc.).


It may prove necessary to re-build everything from source later to get better performance, but to just get “up and running” at first, the HPC expert for my system that I talked to advised trying pip first. (“I use pip and anaconda for my own work” he said.)

So, since the Xeon Phi is not a GPU, we’ll follow the “CPU” instructions for the fastai library.

But to do that, we need some kind of environment. And Intel has their own Anaconda, so let’s go with that and then use the pip inside that environment…

II.1 Environment installation:

I started here: Intel Distribution for Python (powered by Anaconda):

“Register and Download” (note that you have to turn off add blockers and anti-tracking plugins in your browser in order to see the form.)

…they’ll send you an email with a link. This is NOT the download link.

Open this in your browser, then when you see the buttons at the bottom (“e.g. Python 3.6 for Linux”), you can right-click, select “Copy Link Address”.

In your shell on the machine with the Phi’s, paste the url into curl, e.g,

$ curl -LO

…it’s about 670 MB.

 $ tar xvfz <the_file_that_you_just_downloaded>

…this puts everything in ./intelpython3

More from the official Intel Installation Notes:

  1. Change directory to intelpython2 or intelpython3 (depending on the version you’ve downloaded)
  2. Run from shell: bash

Ok great. So NOTE that this directory (e.g. ~/intelpython3) becomes your “anaconda” directory. You do not need a separate Anaconda distribution, you are using it (Intel’s) already.

  1. When the installation completes, activate your root Intel python conda environment:
  • To modify only your current shell, use the source ./bin/activate command
  • To modify all future logins, do one of the following:
    • Add source <install>/bin/activate root to your .bashrc (bash) or other logon script
    • Manually add the <install>/bin directory to your PATH
  • To ensure your environment points to Intel Distribution for Python, run the which python command.

Ok, sure. Edit ~/.bashrc to include the line

$ export PATH=$HOME/intelpython3/bin/activate:$PATH

and to end with the line

$ source $HOME/intelpython3/bin/activate

…and then

source ~/.basrhc

And check:

$ which python


Sweet. We have python!

II.2 Fastai “CPU build”

Technically, a Xeon Phi is a “CPU”…or at least it’s not a “GPU”, so we’ll be following “CPU” instructions.

First, let’s just check that we’re using the pip we think we are:

$ which pip


Ok. …Now we just follow the CPU build instructions

The fastai instructions say to install pytorch and torchvision, but to

“…Just make sure to pick the correct torch wheel url, according to the needed platform, python and CUDA version, which you will find here.”

So for example, to get the latest non-GPU, Python 3.6 version for Linux, this means we run…

$ pip3 install
$ pip3 install

…That worked. And get the Jupyter extensions:

$ pip install jupyter notebook jupyter_contrib_nbextensions

…and that worked. Let’s try it!

On the remote machine:

$ jupyter notebook --no-browser --port=8889

On my laptop:

$ ssh -N -n -L <myusername@remote machine>

Pull up a browser, paste in the tokenized URL that jupyter printed and…

…ohhh, that didn’t work. In my case, this computer system has an ‘alias’ for the computer name, but it dumps you to a different specific login node every time. So, it turned out that the jupyter notebook was running on one login node and the ssh port-forward ended up on a different login node. So, make sure you use same the specfic login node for both the jupyter server and the port-forwarding.

Ok, got that sorted, and we can connect to the right notebook server. Let’s grab the fastai lessons…

$ git clone

Go to my local browser showing the Jupyter server, and click on the lesson-1-pets notebook from the course, run the cells…


…but it’s amazingly slow. As in, anything you do – any assignment, import, or Jupyter inline directive – takes 5 to 10 seconds.

EDIT: The real reason it was incredibly slow is that I was running on a login node, and the login nodes are deliberately resource-limited.

To get proper computational resources, you’ll probably need to get to know the job-submission systems on your machine (e.g., Slurm). It may be possible to use this system to request/schedule an interactive job and thereby run Jupyter Notebooks. It will depend on your system whether you can get an interactive job to run quickly of if it won’t be until, say, 2am when a free node is available.

For now, I’ll stick to non-interactive Python scripts. So, I hope to follow up later about using the fastai library, but not the notebooks.

…to be continued (or edited by more knowledgeable people)