Productionizing models thread

Is there a version of this thread for beginners?

In lecture 2, Jeremy said to try to build a web app “This week”, and said that there are lots of tutorials out there. From the two days worth of tutorials I’ve watched so far, I have gathered that in order to build a web-app, you need baseline understanding in the following areas:

  1. A framework like Flask or Django
  2. JavaScript
  3. CSS
  4. HTML
  5. Template Engines
  6. Maybe SQL too…

That’s not happening “this week”, at least for me. I feel like I am missing something. Can someone point the dumb new person to the right place?

1 Like

To accelerate the inference on CPU we can convert the pytorch model to an “optimized model” Openvino model from Intel https://software.intel.com/en-us/openvino-toolkit (passing through an intermediate ONNX model ). I’ve just update a colab notebook (https://colab.research.google.com/drive/1z4GxsF6LaVl5cSeGgCbWTefuUIz78gfV) to summarize all the step to install the latest openvino (l_openvino_toolkit_p_2020.2.120) and run a resnet fastaiv1 model. The work is based on this notebook for old openvino version (https://colab.research.google.com/drive/1TdjV6bSrgSAL6RcGsYzCaWMdaqaqCYID).

I tested with 3 images on CPU:

  • Without Openvino: ~ 700ms
  • With Openvino: ~ 500ms

The gap is not so big but you can use some accelerators which are compatible with Openvino (Intel GPU, Neural Compute Stick), and normally, it is cheaper than NVIDIA GPU

Hi, I was just about to look through the code, but permission denied :joy:

Ah sorry. Does it work with this one ? https://colab.research.google.com/drive/1z4GxsF6LaVl5cSeGgCbWTefuUIz78gfV

1 Like

Laura, thanks for sharing this.

I’ve been trying to deploy a fastai model using lambda with no success.
As I was not able to use a public PyTorch layer, I’m building my own.
Applying all the recommendations, I ended up with dependencies totaling 437Mb. The compressed layer has 130Mb.

When trying to run the lambda function, I get the error ‘No space left on device’.

Below the most top dependency folders:
437M ./build/requirements
305M ./build/requirements/torch
272M ./build/requirements/torch/lib
68M ./build/requirements/numpy
29M ./build/requirements/numpy/.libs
23M ./build/requirements/numpy/core
23M ./build/requirements/pandas
17M ./build/requirements/pandas/_libs
17M ./build/requirements/torch/bin
12M ./build/requirements/torchvision
11M ./build/requirements/numpy/random
11M ./build/requirements/torch/include
8.6M ./build/requirements/matplotlib
8.0M ./build/requirements/caffe2
7.2M ./build/requirements/caffe2/python

Any hint/suggestion will be appreciated!

Hi Alejandro

Yes I see, the problem is the pytorch dependency, which is too big. It’s why I used the separate public layer for pytorch.

This post by @matt.mcclean explains how the pytorch layer was made, it might help:

AWS Lambda has a limit of 250 MB for the deployment package size including lamba layers. PyTorch plus its dependencies is more than this so we need to implement a trick to get around this limit. We will create a zipfile called .requirements.zip with all the PyTorch and associated packages. We will then add this zipfile to the Lambda Layer zipfile along with a python script called unzip_requirements.py . The python script will extract the zipfile .requirements.zip to the /tmp when the Lambda execution context is created.

Hope that helps!

Laura

1 Like

Thank you for this wonderful thread.
How can I deposit uploaded image files on the server?
For example, in OnRender example, the uploaded image file is not stored on the server.
However, it would be better to save the user’s input images to the server, and possibly update the model in the future with larger image database.
Anybody have some ideas how to stack the uploaded files?

@wcneill forking this repo is probably the easiest way to get started, and if you want to create a web app that doesn’t require uploading photos you can also fork this repo I made that works in conjunction with the render template I posted above.

Correct me if I’m wrong, but this method loads and uses the model for inference every time you call it right?