Would you use the fastai library to run the model in production or would you need to save it as a pytorch model. Is there an optimized version of the model that is more designed for production environments?
I try to give this a shot. If you are deploying deep learning model at scale, here’s some best practices in design architecture:
- select the right deep learning framework for production.
- separation of concerns
- break the overall application down into several constituent pieces.
- it needs to be loosely coupled to the trained models which are doing the core predictive tasks.
- select the right stack
- load balancing and automatic scaling (e.g. with Kubernetes/Docker container)
- GPU memory management (take extra care as different models have different memory characteristics, even for inference mode)
The following diagram represents the stack (server architecture):
Web front-end (user interface) <--> Load balancer <--> Web back-end (Node.js: building and serving RESTful APIs) <--> Distributed task queue (Celery) <--> Model service (Flask app: pre-process scripts, trained/serialized PyTorch models which are doing core predictive tasks)
If you plan to use TensorFlow and Node.js together to simplified things a little, TensorFlow.js’s team is actively working on creating Node.js bindings to the TensorFlow C API: https://js.tensorflow.org/faq/
One way - Deploy a Python Web Framework App (Flask, Django, Tornado, Asyncio - Pick your flavor) that takes a Request and returns a JSON Response (API Service). This app then will load your PyTorch Model or Tensorflow or any other DL Frameworks and run the prediction task and return a response. You will find a number of examples in github for these tasks.
Now that you have an API Service, you can query it with your NodeJS Backend App or directly from any other App. If you prefer, you can put this in a docker container and deploy in VPC that only your Backend app has access to.
Thanks for both responses. @ramesh this is probably the simpler approach and the one I’ll likely use given the lack of maturity around nodeJS bindings for pyTorch (which is what I’m using for now). However the disadvantage is that I’m trying to reduce latency as I’m going to be making multiple requests per second for predictions and so the round trip between the two web apps is undesirable.
Do you need to do something special to your models to optimize them for high throughput (ignoring load balancing and other optimizations outside of the model itself)
The short answer is - I don’t know.
I would first try to see what’s the best possible response. You can sue TF or Keras or whatever and see what is the lowest latency possible. Then that becomes the target I am working towards. IMO, if you want response less than a second, you have to look into caching.
Second, if you get multiple simultaneous requests per second, you might be better off batching them say everything that came in the 500 ms window into one Forward Pass then return the results . This certainly requires some more engineering and I have not done it myself. This approach was talked about at the TensorFlow summit last year.
Sorry to necro the post, but currently looking into solving the same problem. Is an acceptable solution just to get NodeJS to make a command-line call like [“python”, “machine_learning_script.py”]? Don’t necessarily see the disadvantages of this approach, especially compared to the Python Flask microservice
If you still need that, check my toy project - it uses a simple node.js module to wrap C++ pytorch lib and load traced model. https://github.com/vvmnnnkv/nodejs-style-transfer
I made an example repo for creating a web app with node, and sending the client loaded image off to a python server to do the predicting. Here you go: https://github.com/chezky/Node-Py-Websocket.git
hello there i hope to get a response , i have a jupyter notebook that i’m saving off of it a catboostclassifier model in a .h5 format json file , i want to load this model file in nodejs and use it to predict results off of inputs that i insert !! I have been researching about this for litterly a week , can anyone here help me ? Do you know anything about what i’am talking about , i need to know what are the libraries that i can use to load this model in nodejs (P.S: i already tried tensorflowjs but it doenst work and i dont think its compatible or something like that , i have an unsolvable error when i worked with it ) I can load the code and the error here if you want .
Thnk you so much .