Deployment Platform: AWS Lambda

Hi everyone!

Both at local testing and deployment of the sample app I keep getting the following error:

Unable to import module 'app': libcudart.so.9.0: cannot open shared object file: No such file or directory

It seems like it comes at the moment of initial import of torchvision. My local machine has no cuda-gpu, but I presume it should not be a problem.

could you advise on how this could possibly be solved?

1 Like

Did you ever figure out a workaround for your problem of deploying to a lambda?

Hi, how de we found out about the layer URLS? I am looking everywhere and there is not listing from AWS ??
The only layers urls are the 2 listed here ?? arn:aws:lambda:AWS_REGION:934676248949:layer:pytorchv1-py36:1 PyTorch 1.0.1

I have tried the instructions given at https://course.fast.ai/deployment_aws_lambda.html. But when I try to hit the url it says {ā€œmessageā€:ā€œForbiddenā€}. The command I am trying is curl -d "{ā€œurlā€:"https://www.holidify.com/blog/wp-content/uploads/2015/11/Gulmarg-Golf-Course-new.jpg"}" -H ā€œContent-Type: application/jsonā€ -X POST https://qr3cinlgrl.execute-api.us-east-2.amazonaws.com/invocations
Any help is appreciated.

Concerning the trace_input parameter, what value would we set when using text classification?
The text length varies depending on the text, so a fixed value would not be the right way, or am I missing something?

You are almost there.
I think this is becuase you are "POST"ing to port 80, the default, you need to send your request to port 3000. Pay attention to the :3000 at the end of your endpoint. E.g. you need:

curl -d "{\"url\":\"https://www.holidify.com/blog/wp-content/uploads/2015/11/Gulmarg-Golf-Course-new.jpg\"}" \
 -H "Content-Type: application/json" \
 -X POST https://qr3cinlgrl.execute-api.us-east-2.amazonaws.com:3000/invocations

PS: Iā€™m doing it and your application times-out. Look at the logs, or checkout my deploy, which is basically the same as the documentation you pointed to: https://github.com/brunosan/iris-ai/tree/master/iris-aws-lambda

Iā€™ve just followed the instructions and deployed my code using that guide, thank you @matt.mcclean

Love the fact that there is no running cost as this is serverless, and that for most of us, it is also free as the per-run cost is going to be well below the free tier (except for the very small cost to host the model on S3).

Here itā€™s my code https://github.com/brunosan/iris-ai/tree/master/iris-aws-lambda

I found that local development is VERY slow since every POST takes up to a minute to answer. Somehow, when deployed, the lambda also takes some time to answer the first time, but then answers basically with no delay. I think this is called ā€œcold startā€, and to avoid it I make a dummy call from the front-end as soon as the page loads, so the API is ready when the first real call is done.

2 Likes

Iā€™d like to share my experiences in getting this solution to work

  1. the hard cap on AWS Lambda at 500MB is a huge challenge

  2. for the last 2 days Iā€™ve read and tried all the different solutions I could find in multiple iterations

  • precompiled lambda packs
  • compiling from source
  1. using the unzip_requirements.txt + lambda layers strategy is the way to go (thank you @matt.mcclean for the excellent docs on how to get this running)

  2. calling np.transpose on a tensor does not work with the ff items in the requirements.txt file
    numpy==1.11.3
    https://download.pytorch.org/whl/cpu/torch-1.1.0-cp36-cp36m-linux_x86_64.whl
    torchvision
    which is what is inside the requirements.txt under https://github.com/mattmcclean/sam-pytorch-example/blob/master/pytorch/requirements.txt

  3. I discovered that I could get my image processing pipeline to work by

TL;DR - once you have custom transformations and models, try finding the cause through debugging by testing interactions between different library versions! + unzip_requirements.py strategy rocks

Iā€™m attempting to modify the IMDB example and deploy to lambda as well.

Iā€™m having a similar issue. Iā€™m really curious how you got past this.

interesting. what made you think it wasnā€™t supported?

Hi everybody! Thanks matt for the guidleines, it helped a lot.
However Iā€™m stuck at this point, when running locally

sam local invoke PyTorchFunction -n env.json -e event.json

Which throw this error:

File "/home/michal/.local/lib/python3.7/site-packages/botocore/client.py", line 586, in _make_api_call
raise error_class(parsed_response, operation_name)
botocore.errorfactory.InvalidParameterValueException: An error occurred (InvalidParameterValueException) when calling the GetLayerVersion operation: Invalid Layer name: arn:aws:lambda:eu-west-1:934676248949:layer:pytorchv1-py36

I am using your yaml template, just changed the region:
Default: "arn:aws:lambda:eu-west-1:934676248949:layer:pytorchv1-py36:1"

My S3 bucket is in this same region. I tried many things but I cannot find what is the problem. Has anyone an idea?

Update:
I have created my own pytorch layer. But still have the same error message.

Update2:
My awscli was configured on the wrong region :grinning:

Hi Everybody,

Big THANKS to @matt.mcclean for putting this Lambda deployment guide together.

Iā€™m super-excited about being able to deploy with lamdba but running into this problem when testing locally or after deploying to lambda:
/tmp/sls-py-req/requests/init.py:91: RequestsDependencyWarning: urllib3 (1.25.7) or chardet (3.0.4) doesnā€™t match a supported version!

Eternal gratitude if anybody can plz help solve.

Interested to know if anyone has deployed a fastai text classifier (v1 pretrained AWD_LSTM + fine tuning) to lambda+api gateway using the torchscript method from the course docs.

Iā€™ve trained a product categorization model which Iā€™d really like to be able to deploy for an internal POC.

Iā€™m currently hitting issues with the torch.jit.trace part. The course docs give the below example which I understand the shape (being used for images). When trying to modify this for a text classifier Iā€™m not sure what to set these values to.

# example from the docs
trace_input = torch.ones(1,3,299,299).cuda()
jit_model = torch.jit.trace(learn.model.float(), trace_input)

The model just takes in a single string and outputs predictions across 817 classes.

Iā€™v tried changing this to the below and got the below error.

# example I've tried
trace_input = torch.ones(1,817).cuda()
jit_model = torch.jit.trace(learn.model.float(), trace_input)

RuntimeError: Expected tensor for argument #1 'indices' to have scalar type Long; but got torch.cuda.FloatTensor instead (while checking arguments for embedding)

Iā€™ve tried other combinations of this but have largely gotten the same error.

Iā€™ve searched these forums and google but cant find an that really helps.

Any help would be greatly appreciated (even it is simple, no this method doesnā€™t work for text models because of xyz)

My learner doesnā€™t have anything special, it mostly the example text classification in the docs.

1 Like

A while ago (maybe a year back) I know it didnā€™t work for text models. Havenā€™t looked at it since.

Thanks for the reply Matt.

I suspected it didnā€™t as I saw some discussions on the Pytorch forums but I didnā€™t really understand the reasons they were giving so wasnā€™t sure if it was an issue for me or not. I ended up starting with the Render starter repo then scraping the UI part and writing a small API with FastAPI.

For anyone else who primarily needs a backend web-service rather than a fancy UI, I can recommend this approach as FastAPI gives nice Swagger interface with little effort. Makes testing the API endpoints easy.

Interesting to hear youā€™re using FastAPI - thatā€™s what we (@aychang) opted for when we created AdaptNLP, which is our fastai-inspired wrapper for Hugging Face and Flair.

1 Like

Hi everybody,

Many thanks to @matt.mcclean for the guide, it helped a lot.
However iā€™m stucked. When testing locally I have an import error for ā€œlibtorch.soā€.

Iā€™m using a custom lambda layers as the links provided do not work for me.
In the ā€œcreate_layer_zipfile.shā€ i saw that ā€œlibtorch.soā€ is removed

rm ./torch/lib/libtorch.so 

Not removing it creates a zip file that exceed 250Mb (AWS lambda limit), removing it gives the error mentioned above.

The only difference with mattā€™s code, is that I have changed the download link in the requirements to torch 1.2.0.

What am I missing?

Hi, been trying to get this to work for me but having a problem when lambda tries to open the model.
Here is the log file grabbed from AWS CloudFunction and it looks like the PyTorch canā€™t read it properly? I trained this model using a Pytorch 1.4 and Python 3.7.
Would it make sense to downgrade to PyTorch 1.1 and reconstruct the model from there?

[INFO] 2020-05-07T17:04:57.431Z Loading model from S3
ļ„æ
17:04:59
Model file is : res50_stage1_v4.pth
Model file is : res50_stage1_v4.pth
ļ„æ
17:04:59
Loading PyTorch model
Loading PyTorch model
ļ…
17:05:00
module initialization error: version_number <= kMaxSupportedFileFormatVersion ASSERT FAILED at /pytorch/caffe2/serialize/inline_container.cc:131, please report a bug to PyTorch. Attempted to read a PyTorch file with version 2, but the maximum supported version for reading is 1. Your PyTorch installation may be too old. (init at /pytorch/caffe2/serialize/inline_container.cc:131) frame #0: std::func
ļ…
17:05:00
END RequestId: 8cafea90-765c-4a19-8ce0-8525956ad0ce
ļ…
17:05:00
REPORT RequestId: 8cafea90-765c-4a19-8ce0-8525956ad0ce Duration: 149.70 ms Billed Duration: 200 ms Memory Size: 3008 MB Max Memory Used: 446 MB
ļ„æ
17:05:00
module initialization error version_number <= kMaxSupportedFileFormatVersion ASSERT FAILED at /pytorch/caffe2/serialize/inline_container.cc:131, please report a bug to PyTorch. Attempted to read a PyTorch file with version 2, but the maximum supported version for reading is 1. Your PyTorch installation may be too old. (init at /pytorch/caffe2/serialize/inline_container.cc:131) frame #0: std::funct
module initialization error
version_number <= kMaxSupportedFileFormatVersion ASSERT FAILED at /pytorch/caffe2/serialize/inline_container.cc:131, please report a bug to PyTorch. Attempted to read a PyTorch file with version 2, but the maximum supported version for reading is 1. Your PyTorch installation may be too old. (init at /pytorch/caffe2/serialize/inline_container.cc:131)
frame #0: std::function<std::string ()>::operator()() const + 0x11 (0x7f6d223d9441 in /tmp/sls-py-req/torch/lib/libc10.so)
frame #1: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x2a (0x7f6d223d8d7a in /tmp/sls-py-req/torch/lib/libc10.so)
frame #2: caffe2::serialize::PyTorchStreamReader::init() + 0xed1 (0x7f6d23fc8431 in /tmp/sls-py-req/torch/lib/libcaffe2.so)
frame #3: caffe2::serialize::PyTorchStreamReader::PyTorchStreamReader(std::istream*) + 0x48 (0x7f6d23fc90f8 in /tmp/sls-py-req/torch/lib/libcaffe2.so)
frame #4: torch::jit::import_ir_module(std::function<std::shared_ptrtorch::jit::script::Module (std::vector<std::string, std::allocator<std
ļ„æ
17:05:00
/tmp/sls-py-req/requests/init.py:91: RequestsDependencyWarning: urllib3 (1.25.8) or chardet (3.0.4) doesnā€™t match a supported version!
/tmp/sls-py-req/requests/init.py:91: RequestsDependencyWarning: urllib3 (1.25.8) or chardet (3.0.4) doesnā€™t match a supported version!
ļ„æ
17:05:00
RequestsDependencyWarning)
RequestsDependencyWarning)

I had the exact same problem and downgrading to torch 1.1.0 solved it. I think the format of the jit trace file changed between pytorch versions.

To downgrade for this export without breaking my fastai2 install, I made a new conda environment as follows:

conda create -n pytorch11 python=3.6 pytorch==1.1.0 torchvision==0.3.0 cudatoolkit=10.0
conda activate pytorch11
conda install -c pytorch -c fastai fastai pytorch=1.1.0 torchvision=0.3.0 cuda100 jupyter jupyterlab
conda install boto3
jupyter notebook password
jupyter notebook --ip=10.0.1.100
python -c ā€œimport torch; print(torch.version)ā€ # this command should print 1.1.0

I then imported the old model in jupyter notebook with:
from fastai.vision import *
classes = [ā€˜Angerā€™, ā€˜Disgustā€™, ā€˜Surpriseā€™, ā€˜Sadnessā€™, ā€˜Happinessā€™, ā€˜Neutralā€™, ā€˜Contemptā€™, ā€˜Fearā€™]
data = ImageDataBunch.single_from_classes(ā€™ā€™, classes, ds_tfms=None)
learner = cnn_learner(data, models.resnet34)
learner.load(ā€˜gokul-sentiment-stage-5nā€™)

ā€¦ and from here on I followed the description on https://course.fast.ai/deployment_aws_lambda.html from ā€œExport your trained model and upload to S3ā€.

Thanks so much!
Iā€™ll give that a shot. Currently to overcome this problem, I just went with Google Cloud Functions. It seems slow (800ms per inference on a resnet50) so I wanna give AWS lambda another try.