Productionizing models thread

pbthp · November 20, 2018, 5:10pm

I created a repository to test PyTorch models in Flask and then deploy them to AWS Lambda using Zappa.
Link to repository: https://github.com/pedrohbtp/pytorch-zappa-serverless

The Gif from above works from an AWS Lambda deployment of the pre-trained english Language Model available in Fastai. You can test it out here: https://pedrohbtp.github.io/language-model/

AWS Lambda has some size limitations, so I had to use fastai’s PyTorch models instead of installing the entire library.

If someone wants an introduction to Flask, I wrote a little about it here.

elruso · November 20, 2018, 6:06pm

@Andreas_Daiminger pay attention to paragraph three of https://www.codacy.com/blog/five-ways-to-slim-your-docker-images/ this way you can minimize your image simple way

Andreas_Daiminger · November 21, 2018, 11:26am

Thanks @elruso! Very useful if you wanna become a docker power user!

prratek · November 30, 2018, 8:18pm

It appears that the Zeit tarball linked to in the deployment tutorial on the website isn’t compatible with the most recent version of fastai. I tried modifying it to use ImageDataBunch.load_empty() but ran into some trouble.

It did deploy but it appears to not actually return the prediction. I hit Analyze, the button switches to Analyzing... and that’s it. Here’s my code, and I’d be super grateful if someone wants to look through it and try and pinpoint why this might be happening. Look at lines 14, 15, 33, 34, 35 for the changes.

P.S. I’ve never actually created a web app before so I might just be missing something silly.

gist.github.com

https://gist.github.com/prratek/8a5bd55aeea7becab976924542d6c0fb

serve.py

from starlette.applications import Starlette
from starlette.responses import HTMLResponse, JSONResponse
from starlette.staticfiles import StaticFiles
from starlette.middleware.cors import CORSMiddleware
import uvicorn, aiohttp, asyncio
from io import BytesIO

from fastai import *
from fastai.vision import *

This file has been truncated. show original

eof · December 11, 2018, 12:48pm

Thanks for your link.

I was able to train my model and get something up and running more or less along your guidelines.

I had to do a lot to get everything to fit inside of lambda as the default deployment was way way too big. So much too big it makes me think you perhaps left out some of that in your repo?

I couldn’t find anyway to tree-shake in python “automatically” so I did myself a bit and was able to shrink enough.

I also had to load my model into memory directly from s3 rather than storing it locally as I couldn’t fit both the code and model in the scratch space.

My ‘deployed’ version is here: https://gdoteof.github.io/neuralnet_stuff/

I have no done a writeup (though all the code, complete with my fumblings is in the above repo).

If anyone ends up on this thread trying to figure out how to shrink their deployment to fit on lambda, take a look at my zappa config to see some things you can remove.

github.com

gdoteof/pytorch-zappa-serverless/blob/master/zappa_settings.rename.json#L11-L25


"exclude": [
  "models", 
  "lc/lib/python3.6/site-packages/torch/lib/libtorch.so", 
  "lc/lib/python3.6/site-packages/torch/lib/libcaffe2.so", 
  "lc/lib/python3.6/site-packages/boto3", 
  "lc/lib/python3.6/site-packages/botocore", 
  "lc/lib/python3.6/site-packages/pip", 
  "lc/lib/python3.6/site-packages/awscli", 
  "lc/lib/python3.6/site-packages/caffe2",
  "lc/lib/python3.6/site-packages/lambda_packages/OpenCV",
  "lc/lib/python3.6/site-packages/lambda_packages/mysqlclient",
  "lc/lib/python3.6/site-packages/lambda_packages/MySQL-Python",
  "lc/lib/python3.6/site-packages/lambda_packages/dlib",
  "lc/lib/python3.6/site-packages/lambda_packages/OpenCV" 
],

pbthp · December 11, 2018, 1:35pm

I am glad you also made it work. Great work there.
If there is interest I can try to explain a little better the deployment and create more documentation on the readme.

Indeed there is no auto tree-shaking. When I did it, I had to choose which of the libraries I would include. That is why I didn’t even include the actual fastai, but only Pytorch related dependencies. I apologize if there was any trouble. I am not sure if I left any unnecessary dependencies there. I can double check.

It is really not possible to fit the model and the code into the lambda space. We have to read it from s3 every time which might make loading a little bit slower.

eof · December 11, 2018, 1:45pm

Did you remove code by actually deleting the libraries rather than just excluding them from the payload? That may be why I missed it.

I would encourage you to expand on what you’ve written already. I found it very useful, but still struggled (a lot) with actually getting it deployed. Partly due to having a slow uplink at the time I was working on (250MB uploads on 1Mb/s make for slow iterations).

Particularly if you could expand on my list of excludes I think that might be helpful. And in general, just clean up your repo a bit to make it ‘accurate’ for deploying on lambda. Your latest commit for example is attempting to use scratch space rather than load into memory directly.

I actually spent several hours trying to make that work because I thought you were able to Glad to know you ended up loading from s3 directly as well.

pbthp · December 11, 2018, 1:59pm

Thank you for the feedback!
I did not really remove the libraries manually
I will update the repository

dhoa · December 12, 2018, 8:40am

Does anyone run successfully fast.ai on embedded system like a raspberry pi ? Is it enough power for deep learning or can you suggest me alternatives ?

Thank you in advances

Borz · December 22, 2018, 1:21am

Does anyone know of a way to uninstall the Zeit Now npm CLI tools? I’ve never heard of this company before this class, and I’m hesitant to install something to my terminal – I don’t know if/what/ or how it’ll talk back to Zeit while I’m not looking.

Apparently a bunch of other people have been having trouble uninstalling both the desktop app and the CLI tools:

Looks like for the CLI you’ll need to hunt for its files manually:

And the desktop app will leave a lot of files as cache behind:

I’m trying this out on a VM, and maybe a cloud VM if that doesn’t work. It’s a hastle, but I don’t want to take the risk of adding junk or security threats to my computer. How safe are these tools to use?

edit: I think a way to uninstall zeit now cli tools is to remove: /usr/local/lib/node_modules/now/, though that’s on Ubuntu Linux; unsure about other OS’s, and not sure if that would lead to a clean uninstall – or if other files would be left around, or changes to config files & etc.

edit2:

Has Zeit deployment changed? I’m running Now from the course deployment docs and I’m getting failed builds after about 30-min wait time after entering now.

Sometimes this’ll fail at step 3/9: Step 3/9 : RUN apt install -y python3-dev gcc and using cache, other times it’ll fail at Step 7/9 and say that it couldn’t find some numpy umath library.

I’ll update this if I find a solution.

edit3:

I tried again with a fresh start. I created a new VM on GCP. Once online it couldn’t find npm via sudo apt install npm so I set up the machine via (see: stackoverflow link):

curl -sL https://deb.nodesource.com/setup_10.x | sudo -E bash -
sudo apt-get install -y nodejs
sudo apt-get install -y build-essential
sudo apt install npm
sudo npm install -g now
sudo npm i -g --unsafe-perm now

Then following the instructions from the fast.ai deployment tutorial:

wget https://github.com/fastai/course-v3/raw/master/docs/production/zeit.tgz
tar xf zeit.tgz
cd zeit
now

After confirming email and running now again, this was the result:

jupyter@instance-1:~/zeit$ now
> WARN! You are using an old version of the Now Platform. More: https://zeit.co/docs/v1-upgrade
> Deploying ~/zeit under blah@gmail.com
> https://zeit-vybcijlirb.now.sh [v1] [in clipboard] (sfo1) [2s]
> Building…
> Sending build context to Docker daemon  31.74kB
> Status: Image is up to date for python:3.6-slim-stretch
>  ---> Using cache
>  ---> 370bd47378c2
> Step 7/9 : RUN python app/server.py
>  ---> Using cache
> Step 3/9 : RUN apt install -y python3-dev gcc
>  ---> 9799e0e87f00
>  ---> Using cache
>  ---> 68038eb9c796

> Error! Build failed

The deployment site itself looked like this. It hung on Storing image for most of the time until returning the Build failed error in terminal.

@arunoda any thoughts? My guess is the tutorial code/docker was configured for an earlier version of fastai.

OCData_nerd · December 22, 2018, 9:04pm

Hey @Borz, I have the exact same issue and can no longer deploy apps on Zeit using the command line (mine also ends with ‘Error! Build failed’ each time). I previously deployed the exact same app on Zeit about a week ago with no issues.

When I try to deploy from Zeit’s Now desktop app (instead of typing ‘now’ in the command line), the error says:

"The built image size (1.8G) exceeds the 100MiB limit" (see screenshot below)

Does anyone know how to resolve? I imagine a change to a library or requirement is driving this size increase, but I don’t have the technical expertise to ID the root cause. The free tier with Zeit requires individual files be < 100MB.

@pankymathur - thanks for the new guide on deploying apps on AWS Beanstalk and Google’s App engine that you mentioned in this post with the new guide here. Unfortunately, I ran into the same issue on AWS (as I did with Zeit) using the starter pack. Any thoughts on a fix for AWS or Zeit?

Borz · December 22, 2018, 9:28pm

I think the v1 zeit now didn’t have the same 100MB restriction? I may be wrong. FWIW I think I got a webapp deployed via Google App Engine from the course tutorial. Just gotta check on turning it off since I don’t think it’s free.

I’d be interested if someone gets zeit working; my guess is it requires some fiddling with the app/server.py or some other file, but I’m not diving into that right now.

OCData_nerd · December 22, 2018, 9:54pm

Thanks. I can confirm the v1 zeit now also had the 100MB (per individual file) restriction.
Originally, my model (.pth file) was ~110MB and I had to reduce the size of the training data several times until it was finally ~98MB and then Zeit deployed it ok.

I’d also be interested in zeit if someone can get it working. If there’s no solution, it seems like Google’s App Engine may be the way to go.

Orig2g · December 23, 2018, 8:27pm

I’ve also experienced some issues following the deployment tutorial:

Any ideas how to fix it?

shafik · December 24, 2018, 2:19pm

the toy example is great! thanks for sharing - any tips on how to modify Jeremy’s tutorial to serve a regression model (e.g. Rossman) through Now?

joshfp · December 24, 2018, 6:30pm

When installing fastai using pip, I was getting the ModuleNotFoundError: No module named 'numpy.core._multiarray_umath message. Downgrading bottleneck package to v1.2.0 solved the problem. Try pip install Bottleneck==1.2.0.

sebderhy · December 24, 2018, 9:22pm

Apparently, it is also possible to deploy fastai models using AWS Sagemaker. There was a presentation about it recently at the AWS re:Invent conference (see slides here), and 2 repos connected to the talk:

Has anyone explored this option yet? If so, what was your experience with it?

Orig2g · December 25, 2018, 4:49pm

Thank you for the reply! First of all, for all the people that are as newbies as I am, to change the version of the Bottleneck, you need to go to the requirements.txt in your zeit folder and append it with “Bottleneck==1.2.0”.
Unfortunately, that didn’t solve the problem. Please find the screen below:

OCData_nerd · December 26, 2018, 1:52am

After a bit of trial & error testing, I was finally able to deploy on Zeit again by updating the fastai version to 1.0.34 in the requirements.txt file (see below). At the time of this posting, the latest fastai library is 1.0.38.

Does anyone know if there’s a better way to approach the Zeit deployment issue (apps aren’t deploying on recent fastai versions, may be related to file size limitations on Zeit)?

requirements.txt file:

-f https://download.pytorch.org/whl/nightly/cpu/torch_nightly.html
torch_nightly
fastai<=1.0.34
starlette
uvicorn
python-multipart
aiofiles
aiohttp

pankymathur · December 31, 2018, 2:41am

For issue _"ModuleNotFoundError: No module named 'numpy.core.multiarray_umath" , It has nothing to do with deployment on Zeit, AWS beanstalk or Google App Engine. You can try local docker image build and it will throw same error.

It’s a numpy related issue and associated with fast.ai internal numpy usage. So, in order to make the latest version of fast.ai (v 1.0.39) works with numpy, the requirement.txt file needs to be updated, I have updated same on my own starter packs and will be submitting starter pack to course-v3 repo too.

I have tested my starter packs on AWS Beanstalk with t3.medium instance and they work fine, will try to test Azure Website for Container Services and Google App Engine too later.

For Zeit deployment, I am not sure, I want to continue using their services, as Zeit has recently put a lot of changes to restrict 100 MB limits. I was not able to resolve 100 MB limit issue even when I have unlimited paid plan and even when I try to explicitly use version 1 of the platform via CLI. They are definitely missing huge business opportunities here.

In the meanwhile, try changing these lines in requirement.txt file exactly in the same order for your apps and let me know if you still face issues.

requirements.txt file:

numpy==1.16.0rc1
fastai
starlette
uvicorn
python-multipart
aiofiles
aiohttp