Platform: SageMaker ✅

matt.mcclean · February 21, 2019, 7:19pm

I have uploaded another example here. It is a notebook that allows you to train your fast.ai model using SageMaker notebook instance then it shows how to upload the model to S3 and deploy as a SageMaker endpoint. You can deploy the endpoint locally to test as well.

There is no support for Elastic Inference with PyTorch models as far as I know. In any case you can launch an endpoint with a CPU based instance (instead of GPU). We recently announced support for ONNX models and the Elastic Inference. An example is shown here.

gnchen · February 21, 2019, 7:34pm

Thanks @matt.mcclean. Hope EI will support fastai soon. BTW, I cannot see the example in the link you provided

github.com

mattmcclean/sagemaker-fastai-examples/blob/master/lesson1/lesson1_deploy_sagemaker.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# fast.ai lesson 1 - training on Notebook Instance and deploy on Amazon SageMaker\n",
    "\n",
    "## Pre-requisites\n",
    "\n",
    "This notebook shows how to use the SageMaker Python SDK to train your fast.ai model on a SageMaker notebook instance then deploy it to Amazon SageMaker for production. \n",
    "\n",
    "In order to use this feature you'll need to install docker-compose (and nvidia-docker if training with a GPU).\n",
    "\n",
    "**Note, you can only run a single local notebook at one time.**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,

This file has been truncated. show original

github keep complaining that Something went wrong and need reload. Tried a few times but no luck. Wondering if just my mac.

Gen

matt.mcclean · February 21, 2019, 8:54pm

Try it again as it worked for me

gnchen · February 22, 2019, 7:18pm

@matt.mcclean Thanks for providing this info. After reviewing it, it seems we jump through a lot of hoops to use SageMaker’s predefined interface and I am not so sure it can scale since it did not support EI. Do you think it is better off to package them (fastai/model/data) in docker and deployed with lambda/API gateway? I might be wrong here. Please feel free to correct me

Gen

matt.mcclean · February 22, 2019, 8:04pm

Not sure what you mean by it won’t scale as it doesn’t support EI. It will depend on the type of fast.ai model and your latency and cost constraints. You can run fast.ai models with non-GPU based instances (e.g m4, m5, c5 etc.) but you may have increased latency on your inference calls compared to a GPU or EI based instance.
For sure you can scale out the SageMaker endpoints horizontally using the Autoscaling feature.

I would recommend you testing out your model with different instance types to find the optimum one taking into consideration both performance/latency and costs

sublimemm · February 23, 2019, 5:58pm

I figured out the issue, other people in my company also use sagemaker and they’re trashing my envs. The “Python3” environment already existed and so the Lifecycle scripts did not configure that environment correctly.

Is there anyone who can tell me how to modify my start up scripts/etc to provide a clean environment and hopefully name it something unique so other people will not trample it? I did quite a bit of googling on conda/source/activate/etc but I honestly can’t make it all work on my own, I’m not super familiar with nix environments and this is the first I’m looking at python, which google is telling me has a pretty rough time with these exact problems.

Any help would be appreciated

gnchen · February 23, 2019, 10:39pm

@sublimemm if you check the life cycle scripts provided by @matt.mcclean. I think you wants to modify line 57 for https://course-v3.fast.ai/setup/sagemaker-create to have different display name

matt.mcclean · February 25, 2019, 5:08pm

I have just published a new Production guide for deploying your fast.ai model on SageMaker here: https://course.fast.ai/deployment_amzn_sagemaker.html

matt.mcclean · March 2, 2019, 7:33pm

The instructions to setup a SageMaker notebook are now even easier and faster. We can now provision all the resources in a CloudFormation script avoiding manual steps.

The setup guide has been updated here: https://course.fast.ai/start_sagemaker.html

sheepish · March 3, 2019, 10:56am

I setup sagemaker, I was very careful to start and stop instance. I did not even use it much as I got working with something else. Yet I just got a 300$ bill. I am not even sure why. There is no detail of what exactly caused it. I thought I was being very careful: how could I have run in such costs without my even noticing?
Anyway, one piece of advice: set a budget limit alarm!
I think personally when you first start using sagemaker you should be alerted by default. Up to you to raise your budget alarm…

sublimemm · March 3, 2019, 9:22pm

go to your aws billing dashboard, they have every cent detailed

markthekoala · March 31, 2019, 9:37pm

Hi I have tried to follow the Sagemaker setup here: https://course.fast.ai/start_sagemaker.html I think that AWS has changed their UI and that the setup instructions are (perhaps) no longer valid. AWS keeps prompting me for some S3 bucket where the cloud formation template is stored.

pl3 · April 11, 2019, 6:35pm

Hey @matt.mcclean, would you happen to have any examples on distributed or parallel training with SageMaker and Pytorch/fast.ai? I made some attempts with nn.DataParallel(model) on an ml.p2.8xlarge, but it actually made things slower. I’m not sure if it is because I set the model to the devices incorrectly (tried following this code: https://pytorch.org/tutorials/beginner/blitz/data_parallel_tutorial.html#create-model-and-dataparallel) or because my model architecture (I’m doing some negative sampling).

mgloria · May 22, 2019, 3:57pm

For those completely new to AWS Sagemaker wanting to use fastai, this is a valuable resource: https://boto3.amazonaws.com/v1/documentation/api/latest/guide/s3-examples.html

shantanub · July 2, 2019, 5:43pm

I am facing network latency when using aws java sdk… can anyone help me in reducing it.

    long invokeStart = System.currentTimeMillis();
    InvokeEndpointRequest request = new InvokeEndpointRequest();
    InvokeEndpointResult p = amazonSageMakerRuntime.invokeEndpoint(request.withEndpointName("<endpoint>").withAccept("application/json").withContentType("application/json").withBody(ByteBuffer.wrap(data.getBytes())));
    System.out.println("Invoke time : "+(System.currentTimeMillis() - invokeStart));

Getting total time as -
Invoke time : 7551

WingT · July 14, 2019, 9:15pm

Hello, I followed the instruction on setting up the stack in Cloudformation, but receiving the error msg’ROLLBACK_COMPLETE’ , no notebook has been created in SageMaker. Could anyone advise me on the solution? Thanks!

FastaiError

Alipiero · July 15, 2019, 8:34am

Hi everyone Also, I have faced the same problem as @WingT , I stuck in ROLLBACK_COMPLETE

yubozhao · July 23, 2019, 12:16am

hi guys, just want to show you guys a quick easy way to deploy to Sagemaker.

We made BentoML, it is an open source python library creating/shipping/running ML services in production. github.com/bentoml/bentoml

After you finished train your model. You define and archive it as ML service

%%writefile my_service.py

from bentoml import api, artifacts, env, BentoService
from bentoml.artifact import FastaiModelArtifact
from bentoml.handler import DataframeHandler

@env(conda_pip_dependencies=['fastai'])
@artifacts([FastaiModelArtifact('model')])
class MyFastaiService(BentoService):
      @api(DataframeHandler)
       def predict(self, df):
            return self.artifacts.model.predict(df)

Archive it in the next cell

from my_service import MyFastaiService
service = MyService.pack(model=learner)
saved_path = service.save('/local/path')

Now, after archive the ML service, we can just use one CLI command to deploy to Sagemaker

$ bentoml deploy {saved_path} --platform=aws-sagemaker

Here is the full example: https://github.com/bentoml/BentoML/tree/master/examples/deploy-with-sagemaker

Let me know what you guys think, love to get feedbacks!

Cheers

Bo

mindtrinket · July 23, 2019, 3:35pm

Problem with sagemaker and running @matt.mcclean’s code. I did the latest install, go to the examples and tried using lesson 4 tabular.

The actual error is this:

ContextualVersionConflict: (requests 2.22.0 (/home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages), Requirement.parse('requests<2.21,>=2.20.0'), {'sagemaker'})

I have gotten import sagemaker to work and pull in from an S3 bucket in other instances, not sure what is going on here.

Edit: I deleted my instance while I try to figure out what is going on.

Thallstein · November 2, 2019, 1:06am

Hi - looking for help with an ImportError: … undefined symbol message. Used this tutorial, https://course.fast.ai/start_sagemaker.html, to create an instance. Going through the 00_notebook I ran the cell

# Import necessary libraries
from fastai.vision import * 
import matplotlib.pyplot as plt

and got the following error:

ImportError: /home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torchvision/_C.cpython-36m-x86_64-linux-gnu.so: undefined symbol: _ZN3c107Warning4warnENS_14SourceLocationENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE

Here’s the full traceback: