Deployment Platform: AWS Lambda

Hi everyone!

Both at local testing and deployment of the sample app I keep getting the following error:

Unable to import module 'app': libcudart.so.9.0: cannot open shared object file: No such file or directory

It seems like it comes at the moment of initial import of torchvision. My local machine has no cuda-gpu, but I presume it should not be a problem.

could you advise on how this could possibly be solved?

1 Like

Did you ever figure out a workaround for your problem of deploying to a lambda?

Hi, how de we found out about the layer URLS? I am looking everywhere and there is not listing from AWS ??
The only layers urls are the 2 listed here ?? arn:aws:lambda:AWS_REGION:934676248949:layer:pytorchv1-py36:1 PyTorch 1.0.1

I have tried the instructions given at https://course.fast.ai/deployment_aws_lambda.html. But when I try to hit the url it says {ā€œmessageā€:ā€œForbiddenā€}. The command I am trying is curl -d "{ā€œurlā€:"https://www.holidify.com/blog/wp-content/uploads/2015/11/Gulmarg-Golf-Course-new.jpg"}" -H ā€œContent-Type: application/jsonā€ -X POST https://qr3cinlgrl.execute-api.us-east-2.amazonaws.com/invocations
Any help is appreciated.

Concerning the trace_input parameter, what value would we set when using text classification?
The text length varies depending on the text, so a fixed value would not be the right way, or am I missing something?

You are almost there.
I think this is becuase you are "POST"ing to port 80, the default, you need to send your request to port 3000. Pay attention to the :3000 at the end of your endpoint. E.g. you need:

curl -d "{\"url\":\"https://www.holidify.com/blog/wp-content/uploads/2015/11/Gulmarg-Golf-Course-new.jpg\"}" \
 -H "Content-Type: application/json" \
 -X POST https://qr3cinlgrl.execute-api.us-east-2.amazonaws.com:3000/invocations

PS: I’m doing it and your application times-out. Look at the logs, or checkout my deploy, which is basically the same as the documentation you pointed to: https://github.com/brunosan/iris-ai/tree/master/iris-aws-lambda

I’ve just followed the instructions and deployed my code using that guide, thank you @matt.mcclean

Love the fact that there is no running cost as this is serverless, and that for most of us, it is also free as the per-run cost is going to be well below the free tier (except for the very small cost to host the model on S3).

Here it’s my code https://github.com/brunosan/iris-ai/tree/master/iris-aws-lambda

I found that local development is VERY slow since every POST takes up to a minute to answer. Somehow, when deployed, the lambda also takes some time to answer the first time, but then answers basically with no delay. I think this is called ā€œcold startā€, and to avoid it I make a dummy call from the front-end as soon as the page loads, so the API is ready when the first real call is done.

2 Likes

I’d like to share my experiences in getting this solution to work

  1. the hard cap on AWS Lambda at 500MB is a huge challenge

  2. for the last 2 days I’ve read and tried all the different solutions I could find in multiple iterations

  • precompiled lambda packs
  • compiling from source
  1. using the unzip_requirements.txt + lambda layers strategy is the way to go (thank you @matt.mcclean for the excellent docs on how to get this running)

  2. calling np.transpose on a tensor does not work with the ff items in the requirements.txt file
    numpy==1.11.3
    https://download.pytorch.org/whl/cpu/torch-1.1.0-cp36-cp36m-linux_x86_64.whl
    torchvision
    which is what is inside the requirements.txt under https://github.com/mattmcclean/sam-pytorch-example/blob/master/pytorch/requirements.txt

  3. I discovered that I could get my image processing pipeline to work by

TL;DR - once you have custom transformations and models, try finding the cause through debugging by testing interactions between different library versions! + unzip_requirements.py strategy rocks

I’m attempting to modify the IMDB example and deploy to lambda as well.

I’m having a similar issue. I’m really curious how you got past this.

interesting. what made you think it wasn’t supported?

Hi everybody! Thanks matt for the guidleines, it helped a lot.
However I’m stuck at this point, when running locally

sam local invoke PyTorchFunction -n env.json -e event.json

Which throw this error:

File "/home/michal/.local/lib/python3.7/site-packages/botocore/client.py", line 586, in _make_api_call
raise error_class(parsed_response, operation_name)
botocore.errorfactory.InvalidParameterValueException: An error occurred (InvalidParameterValueException) when calling the GetLayerVersion operation: Invalid Layer name: arn:aws:lambda:eu-west-1:934676248949:layer:pytorchv1-py36

I am using your yaml template, just changed the region:
Default: "arn:aws:lambda:eu-west-1:934676248949:layer:pytorchv1-py36:1"

My S3 bucket is in this same region. I tried many things but I cannot find what is the problem. Has anyone an idea?

Update:
I have created my own pytorch layer. But still have the same error message.

Update2:
My awscli was configured on the wrong region :grinning:

Hi Everybody,

Big THANKS to @matt.mcclean for putting this Lambda deployment guide together.

I’m super-excited about being able to deploy with lamdba but running into this problem when testing locally or after deploying to lambda:
/tmp/sls-py-req/requests/init.py:91: RequestsDependencyWarning: urllib3 (1.25.7) or chardet (3.0.4) doesn’t match a supported version!

Eternal gratitude if anybody can plz help solve.

Interested to know if anyone has deployed a fastai text classifier (v1 pretrained AWD_LSTM + fine tuning) to lambda+api gateway using the torchscript method from the course docs.

I’ve trained a product categorization model which I’d really like to be able to deploy for an internal POC.

I’m currently hitting issues with the torch.jit.trace part. The course docs give the below example which I understand the shape (being used for images). When trying to modify this for a text classifier I’m not sure what to set these values to.

# example from the docs
trace_input = torch.ones(1,3,299,299).cuda()
jit_model = torch.jit.trace(learn.model.float(), trace_input)

The model just takes in a single string and outputs predictions across 817 classes.

I’v tried changing this to the below and got the below error.

# example I've tried
trace_input = torch.ones(1,817).cuda()
jit_model = torch.jit.trace(learn.model.float(), trace_input)

RuntimeError: Expected tensor for argument #1 'indices' to have scalar type Long; but got torch.cuda.FloatTensor instead (while checking arguments for embedding)

I’ve tried other combinations of this but have largely gotten the same error.

I’ve searched these forums and google but cant find an that really helps.

Any help would be greatly appreciated (even it is simple, no this method doesn’t work for text models because of xyz)

My learner doesn’t have anything special, it mostly the example text classification in the docs.

1 Like

A while ago (maybe a year back) I know it didn’t work for text models. Haven’t looked at it since.

Thanks for the reply Matt.

I suspected it didn’t as I saw some discussions on the Pytorch forums but I didn’t really understand the reasons they were giving so wasn’t sure if it was an issue for me or not. I ended up starting with the Render starter repo then scraping the UI part and writing a small API with FastAPI.

For anyone else who primarily needs a backend web-service rather than a fancy UI, I can recommend this approach as FastAPI gives nice Swagger interface with little effort. Makes testing the API endpoints easy.

Interesting to hear you’re using FastAPI - that’s what we (@aychang) opted for when we created AdaptNLP, which is our fastai-inspired wrapper for Hugging Face and Flair.

1 Like

Hi everybody,

Many thanks to @matt.mcclean for the guide, it helped a lot.
However i’m stucked. When testing locally I have an import error for ā€œlibtorch.soā€.

I’m using a custom lambda layers as the links provided do not work for me.
In the ā€œcreate_layer_zipfile.shā€ i saw that ā€œlibtorch.soā€ is removed

rm ./torch/lib/libtorch.so 

Not removing it creates a zip file that exceed 250Mb (AWS lambda limit), removing it gives the error mentioned above.

The only difference with matt’s code, is that I have changed the download link in the requirements to torch 1.2.0.

What am I missing?

Hi, been trying to get this to work for me but having a problem when lambda tries to open the model.
Here is the log file grabbed from AWS CloudFunction and it looks like the PyTorch can’t read it properly? I trained this model using a Pytorch 1.4 and Python 3.7.
Would it make sense to downgrade to PyTorch 1.1 and reconstruct the model from there?

[INFO] 2020-05-07T17:04:57.431Z Loading model from S3

17:04:59
Model file is : res50_stage1_v4.pth
Model file is : res50_stage1_v4.pth

17:04:59
Loading PyTorch model
Loading PyTorch model

17:05:00
module initialization error: version_number <= kMaxSupportedFileFormatVersion ASSERT FAILED at /pytorch/caffe2/serialize/inline_container.cc:131, please report a bug to PyTorch. Attempted to read a PyTorch file with version 2, but the maximum supported version for reading is 1. Your PyTorch installation may be too old. (init at /pytorch/caffe2/serialize/inline_container.cc:131) frame #0: std::func

17:05:00
END RequestId: 8cafea90-765c-4a19-8ce0-8525956ad0ce

17:05:00
REPORT RequestId: 8cafea90-765c-4a19-8ce0-8525956ad0ce Duration: 149.70 ms Billed Duration: 200 ms Memory Size: 3008 MB Max Memory Used: 446 MB

17:05:00
module initialization error version_number <= kMaxSupportedFileFormatVersion ASSERT FAILED at /pytorch/caffe2/serialize/inline_container.cc:131, please report a bug to PyTorch. Attempted to read a PyTorch file with version 2, but the maximum supported version for reading is 1. Your PyTorch installation may be too old. (init at /pytorch/caffe2/serialize/inline_container.cc:131) frame #0: std::funct
module initialization error
version_number <= kMaxSupportedFileFormatVersion ASSERT FAILED at /pytorch/caffe2/serialize/inline_container.cc:131, please report a bug to PyTorch. Attempted to read a PyTorch file with version 2, but the maximum supported version for reading is 1. Your PyTorch installation may be too old. (init at /pytorch/caffe2/serialize/inline_container.cc:131)
frame #0: std::function<std::string ()>::operator()() const + 0x11 (0x7f6d223d9441 in /tmp/sls-py-req/torch/lib/libc10.so)
frame #1: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x2a (0x7f6d223d8d7a in /tmp/sls-py-req/torch/lib/libc10.so)
frame #2: caffe2::serialize::PyTorchStreamReader::init() + 0xed1 (0x7f6d23fc8431 in /tmp/sls-py-req/torch/lib/libcaffe2.so)
frame #3: caffe2::serialize::PyTorchStreamReader::PyTorchStreamReader(std::istream*) + 0x48 (0x7f6d23fc90f8 in /tmp/sls-py-req/torch/lib/libcaffe2.so)
frame #4: torch::jit::import_ir_module(std::function<std::shared_ptrtorch::jit::script::Module (std::vector<std::string, std::allocator<std

17:05:00
/tmp/sls-py-req/requests/init.py:91: RequestsDependencyWarning: urllib3 (1.25.8) or chardet (3.0.4) doesn’t match a supported version!
/tmp/sls-py-req/requests/init.py:91: RequestsDependencyWarning: urllib3 (1.25.8) or chardet (3.0.4) doesn’t match a supported version!

17:05:00
RequestsDependencyWarning)
RequestsDependencyWarning)

I had the exact same problem and downgrading to torch 1.1.0 solved it. I think the format of the jit trace file changed between pytorch versions.

To downgrade for this export without breaking my fastai2 install, I made a new conda environment as follows:

conda create -n pytorch11 python=3.6 pytorch==1.1.0 torchvision==0.3.0 cudatoolkit=10.0
conda activate pytorch11
conda install -c pytorch -c fastai fastai pytorch=1.1.0 torchvision=0.3.0 cuda100 jupyter jupyterlab
conda install boto3
jupyter notebook password
jupyter notebook --ip=10.0.1.100
python -c ā€œimport torch; print(torch.version)ā€ # this command should print 1.1.0

I then imported the old model in jupyter notebook with:
from fastai.vision import *
classes = [ā€˜Anger’, ā€˜Disgust’, ā€˜Surprise’, ā€˜Sadness’, ā€˜Happiness’, ā€˜Neutral’, ā€˜Contempt’, ā€˜Fear’]
data = ImageDataBunch.single_from_classes(’’, classes, ds_tfms=None)
learner = cnn_learner(data, models.resnet34)
learner.load(ā€˜gokul-sentiment-stage-5n’)

… and from here on I followed the description on https://course.fast.ai/deployment_aws_lambda.html from ā€œExport your trained model and upload to S3ā€.

Thanks so much!
I’ll give that a shot. Currently to overcome this problem, I just went with Google Cloud Functions. It seems slow (800ms per inference on a resnet50) so I wanna give AWS lambda another try.