Exporting a model for local inference mode

Exporting to ONNX format and using the ONNX Runtime, an open source cross platform efficient runtime for DNN inferencing is another option to consider. It supports CPU and GPU inferencing. The runtime is C++ based but has language interfaces in Python, C#, Java etc so it can integrate well with your application.

Here is some instructions on how to deploy an ONNX model to Azure and run predictions using the relatively lightweight ONNX Runtime. The main predict code is quite generic and can be used on any platform (Linux, Windows, Mac) for local inference . You dont need Pytorch or fast.ai library if you are using ONNX. So the deployment package is much smaller.

You can export your fast.ai/pytorch models graph / weights to ONNX by using the following snippet (in the Bear detector example) after the training.

dummy_input = torch.randn(1, 3, 224, 224, device='cuda')
onnx_path =  "./model.onnx"
torch.onnx.export(learn.model, dummy_input, onnx_path, verbose=False)