Minimal Packaging for Inference

M1TR · August 17, 2021, 5:55am

Looking for some general guidelines on how to deploy a fastai model for inference.

Assume I want to use a Raspberry Pi with limited resources to run the grizzly vs teddy bear classifier. How do I go about setting up the environment? Is there any way around installing the entire fastai library? Is there any way to install only those parts of the library that I’m using? How do I identify which parts of the library I am using?

I’m not looking for any concrete steps but some general advice on how I might go about setting about a minimal environment for inference. An example would be helpful.

Thanks!

M1TR · August 18, 2021, 4:44am

bump!

BobMcDear · August 18, 2021, 2:11pm

Hello,

First, consider what your data processing steps are: More often than not, PIL suffices as it includes many, many types of image transforms, from simple ones like resizing to more complex ones like applying a custom kernel. So, PIL is required (unless you use another library like OpenCV, but I advise against it for basic tasks like cropping.), but it is also often enough, and you don’t need any other dependencies.

For making predictions with the model, you don’t need fastai; PyTorch alone would do because remember, the underlying models used by fastai are all PyTorch models (accessible via learn.model). You could, after processing your image, feed it to the PyTorch model, and there’d be no need for fastai.

The only caveat is that there are a few post-processing actions you may have to perform: If your task is single-label classification, the model likely outputs logits that have to be turned into probabilities with a normalizer like the softmax function. In the case of image-to-image translation, the output of the model would have to be denormalized and potentially scaled to a [0, 255] range. Whichever the case though, it’s nothing impossible, and only a few extra lines of code would have to be written.

Another available option would be ONNX. It requires converting the PyTorch model into an ONNX model (occasionally a headache), but the upside is that it is superbly fast and efficient. The pre- and post-processing steps remain the same, but they would need to be done in NumPy rather than PyTorch because ONNX models accept and spit back NumPy arrays.

The bottom line is, fastai is by no means necessary (although it does make life easier), and you have two other options: PyTorch (which ships with NumPy, but you can usually go ahead and remove it), or ONNX + NumPy. The former is easier to use, whereas the latter is faster and leaves a smaller memory footprint at the expense of generally being more difficult to deal with, particularly when the model includes lesser-known layers.

Good luck!

M1TR · August 18, 2021, 7:03pm

Thanks for the detailed response! I knowingly kept preprocessing out of the scope of the question for the sake of simplicity, but your insight about ONNX being faster and lighter on memory is exactly what I was looking for. It had always bothered me that I had to install the entire DL library (be it PyTorch or fastai) just to run inference.

Just for the sake of argument, if I didn’t want to use ONNX, is there a subset of PyTorch or any other lightweight library which can be used to simply derive the outputs from a model? I know that the answer will almost certainly depend on the kind of layers in the model, but humour me for a moment.

BobMcDear · August 23, 2021, 3:02pm

Apologies for the late response, I was away for the weekend. Happy to help!

To the best of my knowledge, there is no straightforward way to do that unless you’re willing to put in tons of extra work because, as you mentioned, you would need to figure out what parts of PyTorch your model is using and download only those sections on your Raspberry Pi. I have actually done something similar to that with a number of lightweight, simple libraries written purely in Python, but PyTorch is a Goliath with a C++ backend, so it’d be leagues more difficult to do something like that with it.

Cheers!

M1TR · August 23, 2021, 6:11pm

Thanks for taking the time. Much appreciated.

robmarkcole · September 9, 2021, 6:16am

There is also a speed/accuracy tradeoff that is particularly acute on a low resource platform like RPi. You might consider converting your trained model to an optimised format like tensorflow-lite. I created a deployment solution at https://github.com/robmarkcole/tensorflow-lite-rest-server