Running, Deploying Fastai successes: Links, Tutorials, Stories

Brad_S · February 24, 2020, 3:35pm

I’m interested in learning ways fastai models can be deployed and run on various (edge) systems. Maybe this will be useful for many and turn into a wiki. I have properly done NONE of these, but I’m starting by collating under headings: forum posts (may link to repos and blog posts) and external links

Why at the Edge?
Your smartspeaker sits there using relatively little power running a ‘cheap’ microcontroller, locally inferring if the wake work is spoken (Alexa, Siri, Hey Google, Jeremy… if you’ve customized it). Your phone can help with text entry as you type, not a few seconds later. Low latency, lower data transfer (lower power) and increased privacy are just some of the reasons why neural net at the edge is of interest.

Topics

Fastai → Pytorch → ONNX
Fastai → Browser
Fastai → IOS
Fastai → Android
Fastai → Jetsons
Fastai → Microcontrollers
Fastai → ONNX elsewhere

Pytorch to ONNX

Fastai is a library built on Pytorch that contains lots of framework, tips and tricks for quickly and flexibly building and training models. The notebooks regularly run predictions or batch inference, but this is not the end environment where many models intend to be deployed.

What is ONNX? Open Neural Network Exchange
Website | GitHub | Wikipedia

ONNX is an open format built to represent machine learning models. ONNX defines a common set of operators - the building blocks of machine learning and deep learning models - and a common file format to enable AI developers to use models with a variety of frameworks, tools, runtimes, and compilers

In other words, ONNX is a shareable format for neural network models usable by

Frameworks e.g. Pytorch, Tensorflow can export models to ONNX
Hardware / developer environments e.g. mobile phones (android, iphone), TPUs, Jetsons etc
Languages e.g. Python, Java, C, C++, C# - APIs can call ONNX runtimes / API even if not every framework is supported directly

fastai tips/tricks/issues?

Its important to realise ONNX only shares the model. If your pipeline relies on input transformations, these transformations will need to be ported to the deployed environment. e.g. this forum post
is flatten() still an issue? Developer chat - #996 by davidpfahler

Pytorch to browser

Is tensorflow.js worth mentioning?
Forum Posts

Pytorch to IOS

Forum Posts

Pytorch to Android

Forum Posts

Pytorch to Jetsons

One thing to note with the Jetson is the Tx2 and Nano (and presumably the others) run on a AArch64 architecture which is not supported by Conda. So you want to be familiar with virtualenv and linux to use them
nano getting started | course
nvidia deep learning frameworks
Forum Posts
I havent found a recent forum post - don’t follow old ones!

Pytorch to Microcontrollers

It feels naughty… but TF lite (TensorFlow lite) is the only framework I know of.
But there’s a podcast of this being done on an Integer only ARM M0
Forum Posts

Pytorch to ONNX elsewhere

ONNX Runtime | Home | Blog

Forum Posts

Brad_S · February 24, 2020, 3:42pm

community - please add suggestions in the comments. As indicated, happy and hoping this could be a wiki for you to edit

madhavajay · February 25, 2020, 2:05am

Best resource for Custom Edge Inference in iOS would be:

Tips on using Pytorch -> ONNX -> CoreML as well as creating custom converters, plus optimisations on CoreML / iOS side.

ozgur · February 25, 2020, 12:28pm

I created a Google Colab sample for converting CycleGan generator to CoreML via Onnx.

mhanan · February 25, 2020, 2:59pm

hey,
i needed to train document image classification at work. so i used the fastai library to train the model, then export it using “jit”. (i didn’t install the fastai library in our deploy container cause it’s really heavy and have too much dependency).
for some reason there were a gap between the results i got when i loaded the image’s using fastai method’s, and when i did it manually.
after some investigation i found that the gap was between the “resize” function i used (from PIL library) and the way fastai method (using tensors).

Brad_S · February 26, 2020, 1:23pm

thanks for this. I recall now that Jeremy at some point said Jit was unusable, but I did see him on Twitter later say he’d given it another go and got speed benefits. I’d forgotten

To include it as a ‘help to deploy’ information, what would we need here?
Torch Jit
torch.jit can be used to turn Pytorch/Fastai/Python code into serialized (faster) code that can be called away from python and other dependencies (e.g. in a C++ program)
… ???

jayeshsaita · February 26, 2020, 3:44pm

I used Intel OpenVINO for faster inference of my mobilenetv2 on Intel CPU. Reduced my inference time from 200 ms (using naive learn.predict) to 25 ms! Best part, my model was still in FP32.

muellerzr · February 26, 2020, 4:23pm

Look at how the fastai2 Mish activation function is done probably for a hint at an example. (It uses Jit)

cudawarped · February 27, 2020, 9:18am

For running inference on your GPU from C++ applications I would suggest using OpenCV once you have exported your ONNX model from Pytorch (OpenCV 4.2.0 has CUDA DNN backend).

Brad_S · February 29, 2020, 3:34pm

turns out the fastbook draft chapter 2 talks about basic deployment using voila from jupyter and binder for free hosting

Brad_S · March 1, 2020, 4:49pm

I’m pretty sure we can do this fastai esque in less than 3 years without all having phds

https://www.eetimes.eu/ide-brings-ai-training-to-mcus-for-the-first-time/

rsomani95 · May 14, 2020, 10:31am

I had the same issue. Use torchvision.transforms.functional.resize, that gave me identical predictions as when using fastai’s resize

rsomani95 · May 19, 2020, 4:53am

I just discovered that torchvision's resize is the same as PIL.resize((size), resample=2).

resample=2 resizes using “bilinear” interpolation whereas by default, resample is set to 3, which is “nearest” interpolation.