Fastai / PyTorch in Production

Thanks for sharing. Keep us informed about how it goes :slight_smile:

1 Like

No problem, will do!

I have a different approach to this: I am using fastai directly in production on my own server.

I load the model in one process and keep it loaded, using a dataloader which specifies dummy files in the train and test directories. When I receive images, I delete the test directory, write the images to disk and refresh the model with the new data loader. This way, I get all the augmentation steps and can run TTA when predicting. I am sure this a slow way to do it, although it is on an NVMe disk. Round trip time is 700ms (not using TTA).

My method is based on this article:


Interesting. Seems like a solid way to do TTA, I could look into adding support for TTA in the repo I set up. I’d recommend you try out the Serverless approach tho, it’s so easy to maintain, and costs like $1 to process ~100k requests. Depending on how many requests you do a month, you may actually be paying more in electricity to power your home server vs having a lambda up.


Thanks, I will definitely check it out

@nextM I was able to implement TTA, but seems to have barely any effect (on the dogs/cats, and 2 other datasets) and it more than doubled the time to return result. Have you noticed enough of an accuracy increase when using TTA that it’s worth the extra processing time?

I’ve been monitoring this thread as my immediate work is to push my GPU trained Fastai model into production. @alecrubin, Is your model trained in a GPU or CPU environment? If in GPU, did you take any measures for bringing into CPU powered AWS Lambda environment? I am having problems with taking my GPU trained model into CPU environment for inference. Kindly help me out with any solutions possible.


1 Like

@gandyer You’re not going to believe how easy this is… just call model.cpu() before you save your state dictionary and it should work. Then, if you want to bring model back to GPU, just call model.cuda().


@alecrubin Thanks for the tip. Can you provide me a simple snippet of how to do this of saving a state_dict, loading a dict and predicting it with a single image? Maybe I am saving the model and loading and predicting the model in a different way. Thanks in advance for the solutions.


@gandyer to save the state dict from the learner, you’ll need to do this.

# Convert your model to CPU mode if you trained on GPU

# Save your state dictionary'model.h5')

# If you converted your model to CPU, bring it back to GPU 

When you load your model in production, it won’t be wrapped in a learner, so that will look something like this.

# Use torch to load your state dictionary from path
state_dict = torch.load('path/to/model.h5', map_location=lambda storage, loc: storage)

# Then load the state dict into your model
model = YourModel()

To get a prediction from your model, you will need to preprocess your image into a tensor. After that, all you need to do is this.

# This is your image transformed and normalized into a tensor

# Your input needs to be a batch wrapped in a Variable
inp = VV_(torch.stack([img]))

# Then just pass your input to the model and it will return the predictions
out = model(inp)

Just an update for anyone who may have tried to implement this but had issues. Turns out that unless you are running Linux on your local machine, you will need to have docker running when deploying your function. So, setting dockerizePip: true in the serverless.yml is not optional as previously stated in the readme.


There’s been some talk of how this will transition to PyTorch 1.0, which is meant to use Caffe under the hood. I haven’t tried it, but I think the following should work:

SCAR works with Theano and Darknet, so I think it should be possible to get working.

It would still not have access to any GPUs (yet - I’m sure this is on AWS’ radar)

Also, I should have said this right away, but fantastic work @alecrubin

1 Like

Thank you for the help. Does anybody know how to get “YourModel()” to a new machine that does not have installed?

Following up on this. Something I didn’t see coming was Lambda supporting other runtimes. Since there is now a C++ runtime for Lambda, it would make sense to export the model in the C+±only format, then load it into Caffe2 in a Lambda function (or using the Beta Pytorch C++ library -

I have used Sagemaker to deploy a multi label text classification fastai based model. I had to create a custom docket container that contained the libraries and the Inference code. The output is generally received in about 100milli seconds. Will share snippets of the code soon.

1 Like

After trying the solutions offered by the big cloud providers, I find that deploying fastai & PyTorch to Azure Functions to be the easiest. I wrote a quick deployment guide that can be found from the course pages (

Here’s a thread I created with more info, feel free to leave any comments or feedback from your experience:

Has anybody tried the c++ api for pytorch ?
If so by your experience what would you say is the easiest way to deploy ?

hey @ktrivedi, did you share your code? I couldn’t find anything regarding your text classfication model.