Platform: SageMaker ✅

(Matt McClean) #104

I have uploaded another example here. It is a notebook that allows you to train your model using SageMaker notebook instance then it shows how to upload the model to S3 and deploy as a SageMaker endpoint. You can deploy the endpoint locally to test as well.

There is no support for Elastic Inference with PyTorch models as far as I know. In any case you can launch an endpoint with a CPU based instance (instead of GPU). We recently announced support for ONNX models and the Elastic Inference. An example is shown here.



Thanks @matt.mcclean. Hope EI will support fastai soon. BTW, I cannot see the example in the link you provided

github keep complaining that Something went wrong and need reload. Tried a few times but no luck. Wondering if just my mac.



(Matt McClean) #106

Try it again as it worked for me

1 Like


@matt.mcclean Thanks for providing this info. After reviewing it, it seems we jump through a lot of hoops to use SageMaker’s predefined interface and I am not so sure it can scale since it did not support EI. Do you think it is better off to package them (fastai/model/data) in docker and deployed with lambda/API gateway? I might be wrong here. Please feel free to correct me



(Matt McClean) #108

Not sure what you mean by it won’t scale as it doesn’t support EI. It will depend on the type of model and your latency and cost constraints. You can run models with non-GPU based instances (e.g m4, m5, c5 etc.) but you may have increased latency on your inference calls compared to a GPU or EI based instance.
For sure you can scale out the SageMaker endpoints horizontally using the Autoscaling feature.

I would recommend you testing out your model with different instance types to find the optimum one taking into consideration both performance/latency and costs

1 Like


I figured out the issue, other people in my company also use sagemaker and they’re trashing my envs. The “Python3” environment already existed and so the Lifecycle scripts did not configure that environment correctly.

Is there anyone who can tell me how to modify my start up scripts/etc to provide a clean environment and hopefully name it something unique so other people will not trample it? I did quite a bit of googling on conda/source/activate/etc but I honestly can’t make it all work on my own, I’m not super familiar with nix environments and this is the first I’m looking at python, which google is telling me has a pretty rough time with these exact problems. :frowning:

Any help would be appreciated



@sublimemm if you check the life cycle scripts provided by @matt.mcclean. I think you wants to modify line 57 for to have different display name


(Matt McClean) #111

I have just published a new Production guide for deploying your model on SageMaker here:


(Matt McClean) #112

The instructions to setup a SageMaker notebook are now even easier and faster. We can now provision all the resources in a CloudFormation script avoiding manual steps.

The setup guide has been updated here:

1 Like


I setup sagemaker, I was very careful to start and stop instance. I did not even use it much as I got working with something else. Yet I just got a 300$ bill. I am not even sure why. There is no detail of what exactly caused it. I thought I was being very careful: how could I have run in such costs without my even noticing?
Anyway, one piece of advice: set a budget limit alarm!
I think personally when you first start using sagemaker you should be alerted by default. Up to you to raise your budget alarm…



go to your aws billing dashboard, they have every cent detailed


(Mark Hatcher) #115

Hi I have tried to follow the Sagemaker setup here: I think that AWS has changed their UI and that the setup instructions are (perhaps) no longer valid. AWS keeps prompting me for some S3 bucket where the cloud formation template is stored.


(Phil Lynch) #116

Hey @matt.mcclean, would you happen to have any examples on distributed or parallel training with SageMaker and Pytorch/ I made some attempts with nn.DataParallel(model) on an ml.p2.8xlarge, but it actually made things slower. I’m not sure if it is because I set the model to the devices incorrectly (tried following this code: or because my model architecture (I’m doing some negative sampling).



This is a beginner question since I am new to AWS: I have my images in an s3 bucket. I created a notebook in Sagemaker. In there, I want to load my images into a databunch as we usually do. I get an error with ImageList.from_folder(path) but I did manage to get one image as follows:

import boto3

s3 = boto3.resource(‘s3’, region_name=‘eu-central-1’)
bucket = s3.Bucket(‘bucket_name’)
object = bucket.Object(‘class1/image_name.jpg’)
tmp = tempfile.NamedTemporaryFile()
with open(,“wb”) as f:
img = open_image(

How can I get them ALL efficiently into the databunch? In addition, have managed to get a list with all the images in my s3 bucket. Thanks a lot!