Deployment Platform: AWS Lambda

matus66 · May 18, 2020, 6:50am

Hi, I went through the guide, and I have already deployed the model to aws. When I run log command this problem occurs:

matus66 · May 18, 2020, 7:52am

I tried streicher solution. It seems there is a problem when exporting the model: trace_input = torch.ones(1,3,299,299).cuda() jit_model = torch.jit.trace(learner.model.float(), trace_input) the error is: Expected more than 1 value per channel when training, got input size torch.Size([1, 1024])

streicher · May 22, 2020, 3:58am

Hi @matus66. It may help to check if you model is running as you expect by attempting an inference before starting the jit trace. The model needs to be loaded in memory for the jit trace to map it correctly. The example at https://github.com/fastai/course-v3/blob/master/docs/production/lesson-1-export-jit.ipynb shows how to export a model that has just been trained. I wrote a small jupyter notebook to import a model that was previously trained by someone else and saved as a pth file. I attach a PDF showing the notebook run. ModelExport - Jupyter Notebook.pdf (101.9 KB)

watanabe · June 9, 2020, 2:30pm

I have a segmentation model I’m trying to deploy to Lambda, however it times out when calling the model(input) traced torchscript function.

I tried retraining the model in a pytorch 1.1 conda environment as suggested recently, however that version of pytorch is unable to trace learners with hooks, which as I understand it from the lessons, are pretty integral to implement the cross-connections of a segmentation network.

Has anyone successfully made a a pytorch 1.4 layer for AWS Lambda? Or have alternative suggestions for other things to try?

mihow · June 10, 2020, 3:26pm

@matt.mcclean Is it possible to share the build script or source you used to create the original lambda layers? I’ve trained a new model that is incompatible with the old version as well. Thanks for all your great work on this deployment method.

mihow · June 11, 2020, 11:15pm

I saw that warning too, but the actual error was a timeout right afterwards. I increased the Timeout to 120 in template.yaml and it worked locally. I’m guessing that’s only the cold startup taking so long, and it seems to be extra slow when testing locally.

watanabe · June 12, 2020, 3:32am

I believe the urllib3+chardet error gets resolved when you train+export your model using pytorch1.1. I also resolved it by adding another layer to my lambda that had current versions of those 2 libraries, but in that case the model fails to load (due to changes between 1.4 and 1.1?)

I’ve tried installing pytorch 1.4 without cuda and get a 124MB zip (which can be uploaded to S3, however it can’t be used as a layer because after extracting it is over 250MB). Lambda seems to have weird undocumented behavior where they say the ZIP cannot be over 50MB, and will block uploads over that limit, but if you give it an object already in S3, it can be over 50MB, as long as once extracted it is under 250MB.

To get around all this mess, right now I’m looking at training/exporting my models in pytorch 1.4 or 1.5, and using ONNX to export and run inference in lambda.

jtunis · July 18, 2020, 9:59pm

@mihow @watanabe I made a script for packaging specified versions of PyTorch + Python into a Lambda Layer, along with examples for deploying it with Serverless Framework and Terraform: https://github.com/JTunis/create-pytorch-lambda-layer.

I haven’t gotten to test a bunch of different versions yet, but I’m currently using it with Python3.8 and torch 1.5.1. Let me know if you have issues or suggestions!

mservantes · October 16, 2020, 5:46pm

@matt.mcclean I’ve been reading through some of work on building an AWS Lambda Layer for fastai. Where exactly did you find the publicly available PyTorch Lambda Layer ARNs? I’m trying to track down the source so I know where to look when it’s updated. Thanks in advance!

Edit: I just realized that you may be the owner/publisher of the Lambda Layer. Is this something you created or something you found? Thanks again!

matt.mcclean · December 4, 2020, 5:12pm

AWS Lambda has just announced support for Container images to package your code and dependencies. I have setup an example project using the SAM CLI here: https://github.com/mattmcclean/fastai-container-sam-app.

Container images can be up to 10GB in size getting around previous issues with the PyTorch package being too large for the lambda zip file.

Would love to hear your feedback!

ganesh.bhat · January 8, 2021, 10:08am

Thanks @matt.mcclean for creating the container approach using fastai. I wanted to understand the whole process in a summary based on what you have written on the GitHub repo, hence I am summarizing my understanding so that I can do it myself. Apologies, if some of my questions seem basic.

Before that, I am sharing what I have done followed by the steps to create the docker container.

We have created a recommendation model using Collaborative Filtering. We will export the model using learn.export()
Build and deploy your application - I am assuming that this deploys the docker container that was created using a CloudFormation template and can be accessed by an API call in the Lambda function.

Q1: I need some clarity here - With this API, can we directly call predict on the model, or do we have to first need to load the model and then call predict?
Q2: For the recommendation Model, we need to train the model daily i.e. once per day before the start of the business day as we get the new data for the previous data which may impact the recommendation output. Can this be handled automatically using the model export?
Q2.1: If the automation is not possible using docker container/lambda, is there an alternative approach to do the deployment?

Thanks in advance for your help.

Regards
Ganesh Bhat

matt.mcclean · January 11, 2021, 3:02pm

Hi there, I would recommend using the new approach to bundle the fastai libraries in a Docker container as per the example project here. Lambda layers still have a max limit of 250 MB which is too small for fastai + Pytorch libs

matt.mcclean · January 11, 2021, 3:10pm

Q1. In the example project on Github the model is loaded when the AWS Lambda function execution environment is started meaning that it is done once and the model can be called multiple times. See code snippet here showing the load_learner() function called before the lambda handler function:

learn = load_learner('export.pkl')

def lambda_handler(event, context):

Q2. If you need to automatically build your model then I would consider using something like SageMaker Training service which you can automate to build your model on a daily basis and save somewhere like S3. In the example the fastai code and model is bundled into a Docker container and pushed to the AWS Elastic Container Registry.

Q3. You could use a service like SageMaker Pipelines or AWS Step Functions to automate the entire sequence of steps to build your model, publish it then deploy.

robmarkcole · December 7, 2021, 12:10pm

@matt.mcclean is it possible that AWS will release an official base image including fast.ai? Building the image is very slow…

One other tip is I am using Github actions to push the image to ECR, and it works nicely

arunboss · March 4, 2022, 1:37pm

Thanks for this forum guys & @matt.mcclean. I am facing the below issue in deploying a simple vision model similar in construct to fastai cat/dog deployment. Can someone please let me know if they have faced a similar issue? Thanks in advance

PS: I was able to deploy successfully in AWS if that helps. Same error when I test invocation there also.

Aruns-MacBook-Air:fastai-container-sam-app arunboss$ sam local invoke FastaiVisionFunction --event events/event.json
Invoking Container created from fastaivisionfunction:python3.7-v1
Building image…
Skip pulling image and use local one: fastaivisionfunction:rapid-1.40.1-x86_64.

START RequestId: 7691bb06-5c7a-4bb2-a88d-493305a675ea Version: $LATEST
return type(a_type) is typing._ClassVar, in _is_classvarssash, frozen)oved
04 Mar 2022 13:26:34,105 [ERROR] (rapid) Init failed error=Runtime exited with error: exit status 1 InvokeID=
return type(a_type) is typing._ClassVar, in _is_classvarssash, frozen)oved
END RequestId: ed3fe1ce-abb2-4549-8a13-a9e4199a0976
REPORT RequestId: ed3fe1ce-abb2-4549-8a13-a9e4199a0976 Init Duration: 2.50 ms Duration: 4710.43 ms Billed Duration: 4711 ms Memory Size: 1048 MB Max Memory Used: 1048 MB
{“errorMessage”: “module ‘typing’ has no attribute ‘_ClassVar’”, “errorType”: “AttributeError”, “stackTrace”: [" File “/var/lang/lib/python3.7/imp.py”, line 234, in load_module\n return load_source(name, filename, file)\n", " File “/var/lang/lib/python3.7/imp.py”, line 171, in load_source\n module = _load(spec)\n", " File “”, line 696, in _load\n", " File “”, line 677, in _load_unlocked\n", " File “”, line 728, in exec_module\n", " File “”, line 219, in _call_with_frames_removed\n", " File “/var/task/app.py”, line 7, in \n from fastai.vision.all import load_learner, PILImage\n", " File “/var/task/fastai/vision/all.py”, line 1, in \n from . import models\n", " File “/var/task/fastai/vision/models/init.py”, line 1, in \n from . import xresnet\n", " File “/var/task/fastai/vision/models/xresnet.py”, line 12, in \n from …torch_basics import *\n", " File “/var/task/fastai/torch_basics.py”, line 9, in \n from .imports import *\n", " File “/var/task/fastai/imports.py”, line 25, in \n import requests,yaml,matplotlib.pyplot as plt,pandas as pd,scipy\n", " File “/var/task/matplotlib/pyplot.py”, line 49, in \n import matplotlib.colorbar\n", " File “/var/task/matplotlib/colorbar.py”, line 21, in \n from matplotlib import _api, collections, cm, colors, contour, ticker\n", " File “/var/task/matplotlib/contour.py”, line 13, in \n from matplotlib.backend_bases import MouseButton\n", " File “/var/task/matplotlib/backend_bases.py”, line 46, in \n from matplotlib import (\n", " File “/var/task/matplotlib/textpath.py”, line 8, in \n from matplotlib import _text_helpers, dviread, font_manager\n", " File “/var/task/matplotlib/_text_helpers.py”, line 12, in \n “LayoutItem”, [“char”, “glyph_idx”, “x”, “prev_kern”])\n", " File “/var/task/dataclasses.py”, line 1133, in make_dataclass\n unsafe_hash=unsafe_hash, frozen=frozen)\n", " File “/var/task/dataclasses.py”, line 958, in dataclass\n return wrap(_cls)\n", " File “/var/task/dataclasses.py”, line 950, in wrap\n return _process_class(cls, init, repr, eq, order, unsafe_hash, frozen)\n", " File “/var/task/dataclasses.py”, line 801, in _process_class\n for name, type in cls_annotations.items()]\n", " File “/var/task/dataclasses.py”, line 801, in \n for name, type in cls_annotations.items()]\n", " File “/var/task/dataclasses.py”, line 659, in _get_field\n if (_is_classvar(a_type, typing)\n", " File “/var/task/dataclasses.py”, line 550, in _is_classvar\n return type(a_type) is typing._ClassVar\n"]}

arunboss · March 4, 2022, 1:39pm

FYI - I had to change the requirement file to make a successful build. Basically had to remove the original version constraints in torch & torchvision.

arunboss · March 8, 2022, 1:24pm

Any suggestions in this group? Please do let me know if I have shared enough info. Happy to share more info. @matt.mcclean & other nice folks here!

Requirements.txt is below. Notice the changes of version - I hope that is not the cause.
torch
torchvision
fastai

if I had tried with the original requirements.txt,
torch==1.7.0+cpu
torchvision==0.8.1+cpu
fastai

the build failed with below error:
Collecting torch==1.7.0+cpu (from -r vision/requirements.txt (line 1))
ERROR: Could not find a version that satisfies the requirement torch==1.7.0+cpu (from -r vision/requirements.txt (line 1)) (from versions: 1.4.0, 1.5.0, 1.5.1, 1.6.0, 1.7.0, 1.7.1, 1.8.0, 1.8.1, 1.9.0, 1.9.1, 1.10.0, 1.10.1, 1.10.2)
ERROR: No matching distribution found for torch==1.7.0+cpu (from -r vision/requirements.txt (line 1))