Exposing DL models as api's/microservices

Please, can you indicate how did you do the torch.save(model), I mean how did you parse it from fastAI learner to a Sequential pytorch type.

Thanks everyone for the feedback!

@jm0077, I’ve made a gist to demo the 2 options to save and load a model in pytorch:

It also shows the whole sequence of training a model on GPU, saving the .h5 model file with fastai, loading that .h5 file locally and testing CPU-only predictions, and then the two ways to save and load the model using pytorch only.

Note that I didn’t demo copying the model definition functions into its own module (step 3 above). If you were to do that (recommended), you should test the module import first before doing the local pytorch save and load model steps.

Also note that fast.ai uses save option 2 (the recommended saving and loading weights via m.state_dict()) under the hood:

In torch_imports.py:

def save_model(m, p): torch.save(m.state_dict(), p)
def load_model(m, p): m.load_state_dict(torch.load(p, map_location=lambda storage, loc: storage))

Hope that’s helpful!


Thanks @daveluo! your gist give me a better idea about the models in pytorch.
However I have an issue, maybe you can help me.
In the first part of the training, the learner object is created using:

learn = ConvLearner.pretrained(arch, data, precompute=True, ps=0.5)

When I visualize that model, it has only 7 layers:

  (0): BatchNorm1d(4096, eps=1e-05, momentum=0.1, affine=True)
  (1): Dropout(p=0.5)
  (2): Linear(in_features=4096, out_features=512, bias=True)
  (3): ReLU()
  (4): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True)
  (5): Dropout(p=0.5)
  (6): Linear(in_features=512, out_features=5, bias=True)
  (7): LogSoftmax()

After the training on the model is done, it has 17 layers, I guess is due to the unfreeze of the model. The problem is when I try to save the model (entire model) it gives me the following error:

Can't pickle local object 'resnext_50_32x4d.<locals>.<lambda>'

So I tried the 2nd method, only save weights instead of the entire model.

torch.save(learn.model.state_dict(), "./torch_model_v1.pt")

It was good but later in order to load the weights I need a model to do that. So how can I get an initialized model with the same architecture (resnext_50) in order to load the weights?

Thanks in advance!


Hi @jm0077,

The Can't pickle local object error you see is related to pickle not being able to serialize the resnext_50_32x4d model creation function (from here) somewhere along the line (probably wherever it’s being called as a lambda function). The middle of this article describes this limitation of pickle: https://medium.com/@jwnx/multiprocessing-serialization-in-python-with-pickle-9844f6fa1812

What did seem to work is using dill instead of pickle to serialize (torch.save enables this through the pickle_module= attribute). Thanks to @ramesh for the offline suggestion to try dill. I did a quick test saving a ConvLearner.pretrained() model with arch=resnext50 using dill and it seemed to save the entire model, load it successfully after restarting the kernel and generate predictions correctly and consistently:

import dill as dill
torch.save(learn.model,'test_resnext50.pt', pickle_module=dill)

I haven’t extensively tested using dill though so can’t promise there won’t be other issues down the line.

If you want to use the 2nd method of saving and loading the weights only, you need to re-initialize your model in the same way you originally defined and created your model when you saved the weights. You have to make sure the variables, classes, functions that go into creating your model are available, whether through module imports or directly within the same script/file.

In the example from my original gist, this looks like:

# model definition stuff
from fastai.conv_learner import *
PATH = Path("data/cifar10/")

stats = (np.array([ 0.4914 ,  0.48216,  0.44653]), np.array([ 0.24703,  0.24349,  0.26159]))

tfms = tfms_from_stats(stats, sz, aug_tfms=[RandomFlip()], pad=sz//8)
data = ImageClassifierData.from_paths(PATH, val_name='test', tfms=tfms, bs=bs)

def conv_layer(ni, nf, ks=3, stride=1):
    return nn.Sequential(
        nn.Conv2d(ni, nf, kernel_size=ks, bias=False, stride=stride, padding=ks//2),
        nn.BatchNorm2d(nf, momentum=0.01),
        nn.LeakyReLU(negative_slope=0.1, inplace=True))

class ResLayer(nn.Module):
    def __init__(self, ni):
        self.conv1=conv_layer(ni, ni//2, ks=1)
        self.conv2=conv_layer(ni//2, ni, ks=3)
    def forward(self, x): return x.add(self.conv2(self.conv1(x)))

class Darknet(nn.Module):
    def make_group_layer(self, ch_in, num_blocks, stride=1):
        return [conv_layer(ch_in, ch_in*2,stride=stride)
               ] + [(ResLayer(ch_in*2)) for i in range(num_blocks)]

    def __init__(self, num_blocks, num_classes, nf=32):
        layers = [conv_layer(3, nf, ks=3, stride=1)]
        for i,nb in enumerate(num_blocks):
            layers += self.make_group_layer(nf, nb, stride=2-(i==1))
            nf *= 2
        layers += [nn.AdaptiveAvgPool2d(1), Flatten(), nn.Linear(nf, num_classes)]
        self.layers = nn.Sequential(*layers)
    def forward(self, x): return self.layers(x)

# initialize model
m = Darknet([1, 2, 4, 6, 3], num_classes=10, nf=32)
learn3 = ConvLearner.from_model_data(m, data)

# load weights

In your case, you would create a new learn = ConvLearner.pretrained(...) and load weights with learn.model.load_state_dict().


I’d like to suggest an alternative model to maintaining servers on the cloud and using serverless infrastructure (AWS lambda for example). This is inexpensive and easier to maintain according to the research. http://aclweb.org/anthology/N18-5002


There’s a thread, complete with an excellent example github by @alecrubin over here:

Definitely check it out if you’re interested in the topic. :slight_smile:


I’m sorry for hijacking the thread but I wanted to share a different type of deploying a model.

I’m a Computational Environmental Designer by trade which means I spend a lot of time running environmental performance studies (energy, daylight, thermal comfort, solar radiation, pv, etc. etc.). Our design spaces (or datasets in AI lingo) are very small compared to most datasets you’re used to but the cost function is usually terribly expensive. An energy simulation for a 4x4 room might take 30secs, so you’d need about a couple of weeks for 45000 models which is a modest dataset.

In one of my experiments of bringing AI to the AEC I’ve been kind of supercharging this parametric design process with ML models. In this case a design is a set of inputs (features) that define different aspects of the building (HVAC system, constructions, orientation, climate zone, etc.). This is built in Grasshopper, a virtual algorithmic environment. When I’ve run some data and after I’ve trained my model I can then ‘bring it back’ into Grasshopper and use it as a sort of ‘generator’ of results.

The image above shows inputs being fed and the model automatically predicting performance.

I realize this isn’t as fancy as REST API but it can really be quite useful in our line of work. In any case, I thought a different approach might be interesting!

On another note, those were models trained on an ensemble of GBMs. For a range of target values between 600-5000, with an average around 2000, I was getting a mean absolute error of 14.9, which was pretty good considering that my training dataset was 20% and I was predicting on the 80%! Now, just today I tried a very (very) quick run of the Entity Embedding implementation in FastAI and I got a 27% reduction in the error (down to almost 10) in just a few minutes, despite tha fact that my categorical variables are really ‘shallow’ (about 2-5 different categories usually)! And the model is blazing fast! I really think it has a great potential in my field, where structured, tabular data is the norm. What I also love is how beautifully it has captured the variance in the data (image below).

I…think I’ve said about 2000 words too much, not to mention hijacking the thread! If anyone feels this is interesting, or in the way, I’d be glad to move it to a separate thread.

Kind regards,


I am using the fastai/courses/dl1/lesson.ipynb and I save the weights like this :

data = ImageClassifierData.from_paths(PATH, tfms=tfms_from_model(arch, sz))
learn = ConvLearner.pretrained(arch, data, precompute=True)

After that, I copied the file to /static/_model/modelweights.h5

but when I ran, python server.py it is giving the following error

### start server  2018-07-19 12:01:39.808413

### image upload folder: /home/ubuntu/flask_fastai_CNN/static/_uploads/unknown/

### data folder: /home/ubuntu/flask_fastai_CNN/static/data/redux
Using gpu device 0: Tesla K80 (CNMeM is disabled, cuDNN Mixed dnn version. The header is from one version, but we link with a different version (5103, 7005))
Using Theano backend.

### initializing model: 
Traceback (most recent call last):
  File "server.py", line 52, in <module>
    vgg = Vgg16()
  File "/home/ubuntu/flask_fastai_CNN/utils/vgg16.py", line 32, in __init__
  File "/home/ubuntu/flask_fastai_CNN/utils/vgg16.py", line 84, in create
  File "/home/ubuntu/anaconda2/lib/python2.7/site-packages/keras/engine/topology.py", line 2494, in load_weights
    f = h5py.File(filepath, mode='r')
  File "/home/ubuntu/anaconda2/lib/python2.7/site-packages/h5py/_hl/files.py", line 269, in __init__
    fid = make_fid(name, mode, userblock_size, fapl, swmr=swmr)
  File "/home/ubuntu/anaconda2/lib/python2.7/site-packages/h5py/_hl/files.py", line 99, in make_fid
    fid = h5f.open(name, flags, fapl=fapl)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5f.pyx", line 78, in h5py.h5f.open
IOError: Unable to open file (file signature not found)

Please tell me where I am wrong

I have explore this area further when I was building a real-world data product recently. The design was inspired by Dave’s posts.

Application System Architecture for Data-driven Product

We know that our application user interface will demonstrate what is possible, it needs to be loosely coupled to the trained models which are doing the core predictive tasks.

In order to preserve a bright line separation of concerns, we break the overall application down into several constituent pieces. Here’s an extremely high level view of the component hierarchy:

  • The job of the prediction service (via the trained models it wraps) is to implement these core predictive tasks and expose them for use, respectively. The models themselves shouldn’t need to know about the prediction service which in turn should not need to know anything about the interface application.
  • The job of the interface backend (API) is to ferry data back and forth between the client browser and model service, handle web requests, take care of computationally intensive transformations not appropriate for frontend Javascript, and persist user-entered data to a data store. It should not need to know much about the interface frontend, but it’s main job is to relay data for frontend manipulation so it’s acceptable for this part to less abstractly generalizable than the prediction service.
  • The job of the interface frontend (UI) is to demonstrate as much value as possible by exposing functionality that the models make possible in an intuitive and attractive format.

Here’s a visual representation of this architecture:


@cedric You should take a look at clipper.ai which @QWERTY1 recently shared. It’s out of the Berkeley RISE lab and is a very well thought out framework for model serving as an API. The website doesn’t really do the framework justice in my mind and the videos are definitely worth looking at. It’s very similar to what you’ve layed out, but has a few more details outlined. It looks like you’ve thought of some other aspects as well so it may be worthwhile joining forces and contributing your ideas/work.

I’m currently trying to convince my company to adopt it for model serving so that we can work on it and help improve it but so far I’ve been very impressed with what it does and their roadmap.


Hi Even, thank you for sharing. That sounds very interesting. This is my first time hearing about clipper.ai. I have seen Polyaxon before. I have glanced through clipper.ai’s website and you are right, it’s a bit light on information. With that in mind, I head over to their codebase and have taken a quick peek at some codes/Dockerfiles there. So far, it leaves me with some impression that it’s worth looking at. So, I plan to take a more serious look at it soon and see if I can contribute in some ways if time allows.

I see. Good to hear.

Check out the video in the other link. It gives a much more solid overview. Definitely seems worth exploring in detail.

1 Like

I found this video in which was presented at the AWS London Summit 2018. There are not a lot of views so I decided to share it on this post. I think it will be really useful for anyone trying to deploy their fastai models (in AWS at least):

Building, Training and Deploying Custom Algorithms Such as Fast.ai with Amazon SageMaker:


thanks for your great tutorial! I’ve successfully followed your plan and able to deploy a skin mole detection web app( on DigitalOcean. What bother me a little is that 1) I used ResNext50 and have to copy the model into the app folder to have it work; 2) remember to check if opencv can be imported properly in DO, have to install some libs to get it work.

p.s. github student pack includes $50 credit for DigitalOcean.
web app github: https://github.com/zeochoy/skinapp


@zeochoy Thanks for sharing. I went through your code and found it super helpful for a newbie like me to understand the deployment process. However, I am still not very clear about how you train and save the model. Specifically, could you elaborate or share the code on how you save the fast.ai model as Pytorch model? Did you train your model using fastai library or only pytorch(skinmodel.py)?

Thanks for your help in advance.


Hello, when exporting my model from fastai to pytorch using

torch.save(learn.model, ‘unet.pt’)
model = torch.load(‘unet.pt’)

I don’t get the same result because of the preprocessing. I am using preprocessing from tfms_from_model of resnet. is there a way to import the preprocessing into pytorch or to get the preprocessing used for resnet?
Please tell me if am doing something wrong

I have exactly the same question you have, but on a modified unet (based on the carvana one). Did you manage to solve it?

Hey,I want to deploy an NLP model on DigitalOcean.
Can you help me in getting started on how to do it on digital ocean

This is really interesting , just wondering whether there is a blog post with more details.

Hi Hari. Unfortunately, no. This is part of our internal documentation for developers and product team.

1 Like