Platform: SageMaker ✅

reshama · October 22, 2018, 8:15pm

Any questions related to SageMaker can be posted here.
This first post will be wiki-fied for helpful references.

Tutorial to get started.
- Note: The current “increase limits” documentation apply to EC2 and do not work with Sagemaker. It will be updated once someone verifies the correct procedure.

Note that this is a forum wiki thread, so you all can edit this post to add/change/organize info to help make it better! To edit, click on the little pencil icon at the bottom of this post. Here’s a pic of what to look for:

avishalom · October 23, 2018, 4:07am

In running the course v2, there was a stage asking to set the kernel to conda_fastai. is this the case here too ? I couldn’t find the option in the dropdown.

Should we really have nothing in the “start notebook” configuration ?

Late edit: no, see updated tutorial above

ryankirkman · October 23, 2018, 5:57am

I encountered the following error when trying to set up a SageMaker notebook instance:

To fix this, you can ask for a resource limit increase here: https://console.aws.amazon.com/support/home?region=us-west-2

For those of you taking the course in person, you can apply the AWS credits you received in your email here: https://console.aws.amazon.com/billing/home?region=us-west-2#/credits

P.S. If someone from AWS is reading this, is there a way we can increase this account-level service limit to 1 or 2 for fastai students?

avishalom · October 23, 2018, 6:09am

Did you need to request the aws credit? how do you get those?

ryankirkman · October 23, 2018, 6:11am

I think that’s specific to the in-person version of the course sorry mate. I can remove that part of my post if it isn’t relevant to a larger audience.

jeremy · October 23, 2018, 6:30am

You should use the limit request info here: https://course-v3.fast.ai/start_aws.html#step-2-request-service-limit

ryankirkman · October 23, 2018, 6:42am

One small note - that looks like instructions for requesting an EC2 service limit increase, as opposed to SageMaker specifically. I wonder if internally AWS treats the p2.xlarge instances differently to the ml.p2.xlarge?

I also wonder if it’s worth adding a SageMaker specific service limit increase to this page: http://course-v3.fast.ai/start_sagemaker.html?

Here’s what my case ended up looking like:

jeremy · October 23, 2018, 6:47am

Yes that would be a good idea. I didn’t realize that sagemaker has a limit of zero - somehow mine was one already.

(If you happen to have time to help out, a PR would certainly be appreciated to fix our docs: https://github.com/fastai/course-v3/blob/master/docs/start_sagemaker.md )

avishalom · October 24, 2018, 5:04am

Done. (PR)
I still can’t consistently get the conda_fastai to appear as a kernel option.
in the v2 version, the setup instructions included all the setup in the “start” stage of the notebook, going to cloudwatch logs would allow you to see when the startup script completed.

currently, I’ve been able to get conda_fastai once (after waiting about 15 minutes) but have not been able to replicate that success
(Thanks)

Kaushikjais · October 24, 2018, 1:10pm

I have been getting this error while creating notebook instance. Can anyone help me out ?

avishalom · October 24, 2018, 1:48pm

Request increase for the limit here. note that it is for fastai in your request

Kaushikjais · October 24, 2018, 2:04pm

While requesting it is asking for resource type. What should i select?

avishalom · October 24, 2018, 5:46pm

EC2 instance iirc

Kaushikjais · October 24, 2018, 5:58pm

i am asking for Amazon Sagemaker

avishalom · October 24, 2018, 6:12pm

that was my answer
see https://aws.amazon.com/ec2/instance-types/ under accelerated computing

Other question, DAK how to access files from google drive through sagemaker ?

jeremy · October 24, 2018, 6:15pm

No, you should follow the tutorial in the top post of this thread. Any other tutorials you find are for old versions, and shouldn’t be used.

avishalom · October 25, 2018, 5:41am

The current docs only work when starting a notebook for the first time.
Some of the scripts (which take a second to run) should run on every startup.

I can’t quite figure it out,
If I leave the startup script empty,
then I open a shell and type

  cd /home/ec2-user/SageMaker
  source activate envs/fastai
  ipython kernel install --name 'fastai' --display-name 'Python 3' --user

it all works out fine, and i get the “Python 3” kernel that can import fastai.
if i try that in the “start” script (in the notebook config), it doesn’t work,
if instead of activate I try

/home/ec2-user/anaconda3/bin/activate

the failure is silent (otherwise it complains about finding activate)

avishalom · October 25, 2018, 7:42am

solved.
the start script should be

#!/bin/bash
set -e

echo "Creating fast.ai conda enviornment"
cat > /home/ec2-user/fastai-setup.sh << EOF
#!/bin/bash
cd /home/ec2-user/SageMaker
source activate envs/fastai
echo "Finished creating fast.ai conda environment"
ipython kernel install --name 'fastai' --display-name 'Python 3' --user
EOF

chown ec2-user:ec2-user /home/ec2-user/fastai-setup.sh
chmod 755 /home/ec2-user/fastai-setup.sh

sudo -i -u ec2-user bash << EOF
echo "Creating fast.ai conda env in background process."
nohup /home/ec2-user/fastai-setup.sh &
EOF

Kaushikjais · October 25, 2018, 8:09am

so instead of the script which has been written in the docs shared by jemery , we should use the one written by you ?

avishalom · October 25, 2018, 12:47pm

I’ve sent a PR to update the docs, but before you change, can you get the original instructions to work once you’ve stopped the instance and then started it again?
(the point of failure is the point when you import fastai on the restarted notebook.)