Platform: SageMaker ✅

Any questions related to SageMaker can be posted here.
This first post will be wiki-fied for helpful references.

  • Tutorial to get started.
    • Note: The current “increase limits” documentation apply to EC2 and do not work with Sagemaker. It will be updated once someone verifies the correct procedure.

Note that this is a forum wiki thread, so you all can edit this post to add/change/organize info to help make it better! To edit, click on the little pencil icon at the bottom of this post. Here’s a pic of what to look for:

discourse_edit_btn

4 Likes

In running the course v2, there was a stage asking to set the kernel to conda_fastai. is this the case here too ? I couldn’t find the option in the dropdown.

Should we really have nothing in the “start notebook” configuration ?

Late edit: no, see updated tutorial above

I encountered the following error when trying to set up a SageMaker notebook instance:

To fix this, you can ask for a resource limit increase here: https://console.aws.amazon.com/support/home?region=us-west-2

For those of you taking the course in person, you can apply the AWS credits you received in your email here: https://console.aws.amazon.com/billing/home?region=us-west-2#/credits

P.S. If someone from AWS is reading this, is there a way we can increase this account-level service limit to 1 or 2 for fastai students?

3 Likes

Did you need to request the aws credit? how do you get those?

I think that’s specific to the in-person version of the course sorry mate. I can remove that part of my post if it isn’t relevant to a larger audience.

You should use the limit request info here: https://course-v3.fast.ai/start_aws.html#step-2-request-service-limit

1 Like

One small note - that looks like instructions for requesting an EC2 service limit increase, as opposed to SageMaker specifically. I wonder if internally AWS treats the p2.xlarge instances differently to the ml.p2.xlarge?

I also wonder if it’s worth adding a SageMaker specific service limit increase to this page: http://course-v3.fast.ai/start_sagemaker.html?

Here’s what my case ended up looking like:

2 Likes

Yes that would be a good idea. I didn’t realize that sagemaker has a limit of zero - somehow mine was one already.

(If you happen to have time to help out, a PR would certainly be appreciated to fix our docs: https://github.com/fastai/course-v3/blob/master/docs/start_sagemaker.md )

1 Like

Done. (PR)
I still can’t consistently get the conda_fastai to appear as a kernel option.
in the v2 version, the setup instructions included all the setup in the “start” stage of the notebook, going to cloudwatch logs would allow you to see when the startup script completed.

currently, I’ve been able to get conda_fastai once (after waiting about 15 minutes) but have not been able to replicate that success
(Thanks)

I have been getting this error while creating notebook instance. Can anyone help me out ?

Request increase for the limit here. note that it is for fastai in your request

1 Like

While requesting it is asking for resource type. What should i select?

EC2 instance iirc

i am asking for Amazon Sagemaker

that was my answer
see https://aws.amazon.com/ec2/instance-types/ under accelerated computing

Other question, DAK how to access files from google drive through sagemaker ?

No, you should follow the tutorial in the top post of this thread. Any other tutorials you find are for old versions, and shouldn’t be used.

The current docs only work when starting a notebook for the first time.
Some of the scripts (which take a second to run) should run on every startup.

I can’t quite figure it out,
If I leave the startup script empty,
then I open a shell and type

  cd /home/ec2-user/SageMaker
  source activate envs/fastai
  ipython kernel install --name 'fastai' --display-name 'Python 3' --user

it all works out fine, and i get the “Python 3” kernel that can import fastai.
if i try that in the “start” script (in the notebook config), it doesn’t work,
if instead of activate I try

/home/ec2-user/anaconda3/bin/activate

the failure is silent (otherwise it complains about finding activate)

solved.
the start script should be

#!/bin/bash
set -e

echo "Creating fast.ai conda enviornment"
cat > /home/ec2-user/fastai-setup.sh << EOF
#!/bin/bash
cd /home/ec2-user/SageMaker
source activate envs/fastai
echo "Finished creating fast.ai conda environment"
ipython kernel install --name 'fastai' --display-name 'Python 3' --user
EOF

chown ec2-user:ec2-user /home/ec2-user/fastai-setup.sh
chmod 755 /home/ec2-user/fastai-setup.sh

sudo -i -u ec2-user bash << EOF
echo "Creating fast.ai conda env in background process."
nohup /home/ec2-user/fastai-setup.sh &
EOF
1 Like

so instead of the script which has been written in the docs shared by jemery , we should use the one written by you ?

I’ve sent a PR to update the docs, but before you change, can you get the original instructions to work once you’ve stopped the instance and then started it again?
(the point of failure is the point when you import fastai on the restarted notebook.)