Platform: SageMaker ✅

I have uploaded another example here. It is a notebook that allows you to train your fast.ai model using SageMaker notebook instance then it shows how to upload the model to S3 and deploy as a SageMaker endpoint. You can deploy the endpoint locally to test as well.

There is no support for Elastic Inference with PyTorch models as far as I know. In any case you can launch an endpoint with a CPU based instance (instead of GPU). We recently announced support for ONNX models and the Elastic Inference. An example is shown here.

Thanks @matt.mcclean. Hope EI will support fastai soon. BTW, I cannot see the example in the link you provided

github keep complaining that Something went wrong and need reload. Tried a few times but no luck. Wondering if just my mac.

Gen

Try it again as it worked for me

1 Like

@matt.mcclean Thanks for providing this info. After reviewing it, it seems we jump through a lot of hoops to use SageMaker’s predefined interface and I am not so sure it can scale since it did not support EI. Do you think it is better off to package them (fastai/model/data) in docker and deployed with lambda/API gateway? I might be wrong here. Please feel free to correct me

Gen

Not sure what you mean by it won’t scale as it doesn’t support EI. It will depend on the type of fast.ai model and your latency and cost constraints. You can run fast.ai models with non-GPU based instances (e.g m4, m5, c5 etc.) but you may have increased latency on your inference calls compared to a GPU or EI based instance.
For sure you can scale out the SageMaker endpoints horizontally using the Autoscaling feature.

I would recommend you testing out your model with different instance types to find the optimum one taking into consideration both performance/latency and costs

1 Like

I figured out the issue, other people in my company also use sagemaker and they’re trashing my envs. The “Python3” environment already existed and so the Lifecycle scripts did not configure that environment correctly.

Is there anyone who can tell me how to modify my start up scripts/etc to provide a clean environment and hopefully name it something unique so other people will not trample it? I did quite a bit of googling on conda/source/activate/etc but I honestly can’t make it all work on my own, I’m not super familiar with nix environments and this is the first I’m looking at python, which google is telling me has a pretty rough time with these exact problems. :frowning:

Any help would be appreciated

@sublimemm if you check the life cycle scripts provided by @matt.mcclean. I think you wants to modify line 57 for https://course-v3.fast.ai/setup/sagemaker-create to have different display name

I have just published a new Production guide for deploying your fast.ai model on SageMaker here: https://course.fast.ai/deployment_amzn_sagemaker.html

2 Likes

The instructions to setup a SageMaker notebook are now even easier and faster. We can now provision all the resources in a CloudFormation script avoiding manual steps.

The setup guide has been updated here: https://course.fast.ai/start_sagemaker.html

1 Like

I setup sagemaker, I was very careful to start and stop instance. I did not even use it much as I got working with something else. Yet I just got a 300$ bill. I am not even sure why. There is no detail of what exactly caused it. I thought I was being very careful: how could I have run in such costs without my even noticing?
Anyway, one piece of advice: set a budget limit alarm!
I think personally when you first start using sagemaker you should be alerted by default. Up to you to raise your budget alarm…

go to your aws billing dashboard, they have every cent detailed

Hi I have tried to follow the Sagemaker setup here: https://course.fast.ai/start_sagemaker.html I think that AWS has changed their UI and that the setup instructions are (perhaps) no longer valid. AWS keeps prompting me for some S3 bucket where the cloud formation template is stored.

1 Like

Hey @matt.mcclean, would you happen to have any examples on distributed or parallel training with SageMaker and Pytorch/fast.ai? I made some attempts with nn.DataParallel(model) on an ml.p2.8xlarge, but it actually made things slower. I’m not sure if it is because I set the model to the devices incorrectly (tried following this code: https://pytorch.org/tutorials/beginner/blitz/data_parallel_tutorial.html#create-model-and-dataparallel) or because my model architecture (I’m doing some negative sampling).

1 Like

For those completely new to AWS Sagemaker wanting to use fastai, this is a valuable resource: https://boto3.amazonaws.com/v1/documentation/api/latest/guide/s3-examples.html

1 Like

I am facing network latency when using aws java sdk… can anyone help me in reducing it.

    long invokeStart = System.currentTimeMillis();
    InvokeEndpointRequest request = new InvokeEndpointRequest();
    InvokeEndpointResult p = amazonSageMakerRuntime.invokeEndpoint(request.withEndpointName("<endpoint>").withAccept("application/json").withContentType("application/json").withBody(ByteBuffer.wrap(data.getBytes())));
    System.out.println("Invoke time : "+(System.currentTimeMillis() - invokeStart));

Getting total time as -
Invoke time : 7551

Hello, I followed the instruction on setting up the stack in Cloudformation, but receiving the error msg’ROLLBACK_COMPLETE’ , no notebook has been created in SageMaker. Could anyone advise me on the solution? Thanks!

FastaiError

1 Like

Hi everyone :slight_smile: Also, I have faced the same problem as @WingT , I stuck in ROLLBACK_COMPLETE

1 Like

hi guys, just want to show you guys a quick easy way to deploy to Sagemaker.

We made BentoML, it is an open source python library creating/shipping/running ML services in production. github.com/bentoml/bentoml

After you finished train your model. You define and archive it as ML service

%%writefile my_service.py

from bentoml import api, artifacts, env, BentoService
from bentoml.artifact import FastaiModelArtifact
from bentoml.handler import DataframeHandler

@env(conda_pip_dependencies=['fastai'])
@artifacts([FastaiModelArtifact('model')])
class MyFastaiService(BentoService):
      @api(DataframeHandler)
       def predict(self, df):
            return self.artifacts.model.predict(df)

Archive it in the next cell

from my_service import MyFastaiService
service = MyService.pack(model=learner)
saved_path = service.save('/local/path')

Now, after archive the ML service, we can just use one CLI command to deploy to Sagemaker

$ bentoml deploy {saved_path} --platform=aws-sagemaker

Here is the full example: https://github.com/bentoml/BentoML/tree/master/examples/deploy-with-sagemaker

Let me know what you guys think, love to get feedbacks!

Cheers

Bo

Problem with sagemaker and running @matt.mcclean’s code. I did the latest install, go to the examples and tried using lesson 4 tabular.

The actual error is this:

ContextualVersionConflict: (requests 2.22.0 (/home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages), Requirement.parse('requests&lt;2.21,&gt;=2.20.0'), {'sagemaker'})

I have gotten import sagemaker to work and pull in from an S3 bucket in other instances, not sure what is going on here.

Edit: I deleted my instance while I try to figure out what is going on.

Hi - looking for help with an ImportError: … undefined symbol message. Used this tutorial, https://course.fast.ai/start_sagemaker.html, to create an instance. Going through the 00_notebook I ran the cell

# Import necessary libraries
from fastai.vision import * 
import matplotlib.pyplot as plt

and got the following error:

ImportError: /home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torchvision/_C.cpython-36m-x86_64-linux-gnu.so: undefined symbol: _ZN3c107Warning4warnENS_14SourceLocationENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE

Here’s the full traceback: