I have a different approach to this: I am using fastai directly in production on my own server.
I load the model in one process and keep it loaded, using a dataloader which specifies dummy files in the train and test directories. When I receive images, I delete the test directory, write the images to disk and refresh the model with the new data loader. This way, I get all the augmentation steps and can run TTA when predicting. I am sure this a slow way to do it, although it is on an NVMe disk. Round trip time is 700ms (not using TTA).
My method is based on this article: https://www.pyimagesearch.com/2018/02/05/deep-learning-production-keras-redis-flask-apache/