I’m deeply interested in this as well. Especially on deploying pytorch models since that’s my main development language now.
I did write a blog post back when I was working primarily in Keras on how to export a model for deployment on tensorflow-serving.
Getting the configuration right here took several days worth of digging around and was a significant challenge so hopefully some people will find it helpful.
It doesn’t cover the tf-serving side, which is a whole other challenge in and of itself to setup and get running correctly. I’m hoping the engineer I worked with on that is going to publish a matching blog post soon and if/when he does I’ll link to it from mine.
We ended up wrapping the call in a tornado server to do error handling and pre/postprocessing. We explore docker initially for scalability, but with a GPU instance you can’t deploy more than one container per machine so there wasn’t much point but for CPU inference it makes more sense.