Exposing DL models as api's/microservices

cedric · September 3, 2018, 3:29am

I have explore this area further when I was building a real-world data product recently. The design was inspired by Dave’s posts.

Application System Architecture for Data-driven Product

We know that our application user interface will demonstrate what is possible, it needs to be loosely coupled to the trained models which are doing the core predictive tasks.

In order to preserve a bright line separation of concerns, we break the overall application down into several constituent pieces. Here’s an extremely high level view of the component hierarchy:

The job of the prediction service (via the trained models it wraps) is to implement these core predictive tasks and expose them for use, respectively. The models themselves shouldn’t need to know about the prediction service which in turn should not need to know anything about the interface application.
The job of the interface backend (API) is to ferry data back and forth between the client browser and model service, handle web requests, take care of computationally intensive transformations not appropriate for frontend Javascript, and persist user-entered data to a data store. It should not need to know much about the interface frontend, but it’s main job is to relay data for frontend manipulation so it’s acceptable for this part to less abstractly generalizable than the prediction service.
The job of the interface frontend (UI) is to demonstrate as much value as possible by exposing functionality that the models make possible in an intuitive and attractive format.

Here’s a visual representation of this architecture:

Even · September 4, 2018, 4:46am

@cedric You should take a look at clipper.ai which @QWERTY1 recently shared. It’s out of the Berkeley RISE lab and is a very well thought out framework for model serving as an API. The website doesn’t really do the framework justice in my mind and the videos are definitely worth looking at. It’s very similar to what you’ve layed out, but has a few more details outlined. It looks like you’ve thought of some other aspects as well so it may be worthwhile joining forces and contributing your ideas/work.

I’m currently trying to convince my company to adopt it for model serving so that we can work on it and help improve it but so far I’ve been very impressed with what it does and their roadmap.

cedric · September 5, 2018, 6:55am

Hi Even, thank you for sharing. That sounds very interesting. This is my first time hearing about clipper.ai. I have seen Polyaxon before. I have glanced through clipper.ai’s website and you are right, it’s a bit light on information. With that in mind, I head over to their codebase and have taken a quick peek at some codes/Dockerfiles there. So far, it leaves me with some impression that it’s worth looking at. So, I plan to take a more serious look at it soon and see if I can contribute in some ways if time allows.

I see. Good to hear.

Even · September 7, 2018, 3:21am

Check out the video in the other link. It gives a much more solid overview. Definitely seems worth exploring in detail.

ryanfras · September 29, 2018, 8:34am

I found this video in which was presented at the AWS London Summit 2018. There are not a lot of views so I decided to share it on this post. I think it will be really useful for anyone trying to deploy their fastai models (in AWS at least):

Building, Training and Deploying Custom Algorithms Such as Fast.ai with Amazon SageMaker:

zeochoy · October 6, 2018, 9:53am

thanks for your great tutorial! I’ve successfully followed your plan and able to deploy a skin mole detection web app(http://104.248.146.179/) on DigitalOcean. What bother me a little is that 1) I used ResNext50 and have to copy the model into the app folder to have it work; 2) remember to check if opencv can be imported properly in DO, have to install some libs to get it work.

p.s. github student pack includes $50 credit for DigitalOcean.
web app github: https://github.com/zeochoy/skinapp

renjiege · October 23, 2018, 4:56am

@zeochoy Thanks for sharing. I went through your code and found it super helpful for a newbie like me to understand the deployment process. However, I am still not very clear about how you train and save the model. Specifically, could you elaborate or share the code on how you save the fast.ai model as Pytorch model? Did you train your model using fastai library or only pytorch(skinmodel.py)?

Thanks for your help in advance.

anasuna · November 26, 2018, 4:33pm

Hello, when exporting my model from fastai to pytorch using

#saving
torch.save(learn.model, ‘unet.pt’)
#loading
model = torch.load(‘unet.pt’)

I don’t get the same result because of the preprocessing. I am using preprocessing from tfms_from_model of resnet. is there a way to import the preprocessing into pytorch or to get the preprocessing used for resnet?
Please tell me if am doing something wrong

Paolo · November 28, 2018, 4:23pm

I have exactly the same question you have, but on a modified unet (based on the carvana one). Did you manage to solve it?

shivamchandhok · February 7, 2019, 10:19am

Hey,I want to deploy an NLP model on DigitalOcean.
Can you help me in getting started on how to do it on digital ocean

harikrishnanrajeev · March 2, 2019, 8:10am

This is really interesting , just wondering whether there is a blog post with more details.

cedric · March 3, 2019, 6:52pm

Hi Hari. Unfortunately, no. This is part of our internal documentation for developers and product team.

Paul2019nz · March 5, 2019, 8:21pm

Hi,

I am newbie using fastai but reading this interesting post in deploying. For IoT applications, could use a messaging framework to send images or data to the inference server to process. Messaging framework like mqtt, coap if embedded devices. Namomsg or zeromq if posix system (Linux). The server application runs as a microservice and could communicates to other services like restful, security using messaging frameworks usually provided by the cloud. If not using the cloud, could use rabbitmq or apacheMQ to communicate between your services. For the web or mobile apps, it just need to hit the rest service to get an update.

Paul

matt.mcclean · March 5, 2019, 10:23pm

I am the presenter there

We have published a couple of new deployment guides based on Amazon SageMaker and AWS Lambda.

rezas · February 20, 2020, 3:45am

do we have feature like tensorflow serving in pytorch?