Integrating trained model into nodejs backend

(Michael) #1

Does anyone have experience with integrating a trained model as a service (possibly microservice) into a NodeJS based backend server architecture. One approach might be to have a separate python based microservice that the NodeJS server can send requests to. I’m also curious as to best practices for integrating the model directly into NodeJS to run it natively in javascript.

Would you use the fastai library to run the model in production or would you need to save it as a pytorch model. Is there an optimized version of the model that is more designed for production environments?

(Cedric Chee) #2

I try to give this a shot. If you are deploying deep learning model at scale, here’s some best practices in design architecture:

  • select the right deep learning framework for production.
  • separation of concerns
    • break the overall application down into several constituent pieces.
  • interface
    • it needs to be loosely coupled to the trained models which are doing the core predictive tasks.
  • select the right stack
  • load balancing and automatic scaling (e.g. with Kubernetes/Docker container)
  • GPU memory management (take extra care as different models have different memory characteristics, even for inference mode)

The following diagram represents the stack (server architecture):

Web front-end (user interface) <--> Load balancer <--> Web back-end (Node.js: building and serving RESTful APIs) <--> Distributed task queue (Celery) <--> Model service (Flask app: pre-process scripts, trained/serialized PyTorch models which are doing core predictive tasks)


If you plan to use TensorFlow and Node.js together to simplified things a little, TensorFlow.js’s team is actively working on creating Node.js bindings to the TensorFlow C API:

(Ramesh Sampath) #3

One way - Deploy a Python Web Framework App (Flask, Django, Tornado, Asyncio - Pick your flavor) that takes a Request and returns a JSON Response (API Service). This app then will load your PyTorch Model or Tensorflow or any other DL Frameworks and run the prediction task and return a response. You will find a number of examples in github for these tasks.

Now that you have an API Service, you can query it with your NodeJS Backend App or directly from any other App. If you prefer, you can put this in a docker container and deploy in VPC that only your Backend app has access to.

(Michael) #4

Thanks for both responses. @ramesh this is probably the simpler approach and the one I’ll likely use given the lack of maturity around nodeJS bindings for pyTorch (which is what I’m using for now). However the disadvantage is that I’m trying to reduce latency as I’m going to be making multiple requests per second for predictions and so the round trip between the two web apps is undesirable.

Do you need to do something special to your models to optimize them for high throughput (ignoring load balancing and other optimizations outside of the model itself)

(Ramesh Sampath) #5

The short answer is - I don’t know.

I would first try to see what’s the best possible response. You can sue TF or Keras or whatever and see what is the lowest latency possible. Then that becomes the target I am working towards. IMO, if you want response less than a second, you have to look into caching.

Second, if you get multiple simultaneous requests per second, you might be better off batching them say everything that came in the 500 ms window into one Forward Pass then return the results . This certainly requires some more engineering and I have not done it myself. This approach was talked about at the TensorFlow summit last year.