Platform: Amazon SageMaker - AWS

matt.mcclean · March 20, 2020, 9:55am

The following is a guide for setting up a Jupyter notebook for the fast.ai course v4. It assumes you already have an AWS account setup. If you do not then follow the instructions here to create and activate your AWS account.

Launch Jupyter notebook instance with AWS CloudFormation

We will create a SageMaker Notebook Instance providing us the Jupyter notebook to run the course exercises by using AWS CloudFormation. To launch the CloudFormation stack click the Launch Stack link for the closest region to where you live in the table below.

Region	Name	Link
US West (Oregon) Region	us-west-2	Launch stack
US West (N. California) Region	us-west-1	Launch stack
US East (N. Virginia) Region	us-east-1	Launch stack
US East (Ohio) Region	us-east-2	Launch stack
Canada (central) Region	ca-central-1	Launch stack
Asia Pacific (Tokyo) Region	ap-northeast-1	Launch stack
Asia Pacific (Seoul) Region	ap-northeast-2	Launch stack
Asia Pacific (Singapore) Region	ap-southeast-1	Launch stack
Asia Pacific (Sydney) Region	ap-southeast-2	Launch stack
Asia Pacific (Mumbai) Region	ap-south-1	Launch stack
Europe (Ireland) Region	eu-west-1	Launch stack
Europe (London) Region	eu-west-2	Launch stack
Europe (Frankfurt) Region	eu-central-1	Launch stack

You should see a screen like the following. Select the Instance Type you want (ml.p2.xlarge has a single Nvidia K80 GPU or the ml.p3.2xlarge with the Nvidia V100 GPU). You can also customize how much disk space you want. The default setting is 50 GB.

Check the tickbox I acknowledge that AWS CloudFormation might create IAM resources. and click the Create stack button to provision the needed resources. You should be taken to the CloudFormation page where it shows that the stack status is CREATE_IN_PROGRESS. Wait for the stack status to change to CREATE_COMPLETE.

Open the SageMaker web console

Open the SageMaker web console by selecting the Services menu item at the top left hand side of your AWS web console and entering the text “Sage” and then selecting the option Amazon SageMaker like the screenshot below.

Open the Jupyter Notebook

On the left navigation bar, choose Notebook instances. This is where we create, manage, and access our notebook instances. You should see that your notebook instance named fastai-v4 status has the status InService as per the screenshot below. Click the link Open Jupyter link.

The first time the notebook instance is created it will install all the fastai2 libraries and dependencies which can take around 10 min.

Open fastai v4 course notebooks

Once you click the Open Jupyter link you will be redirected to the Jupyter notebook web interface with the course notebooks already installed.

The first time you open any of the notebooks you will be asked to select the Jupyter kernel. Select the kernel named fastai2 in the drop down selection like the screenshot below and click the Set Kernel button.

If you do not see the option fastai2 then the fastai libraries and dependencies have not finished installing. Wait up to 10 min for this to complete, refresh the page and try to select the fastai2 kernel.

Stop notebook instance when finished exercises

Please remember to stop your notebook instance when you have finished running the course notebooks as you will be charged by the hour if they are running. To do this select the notebook instance and select the Stop action like the screenshot below.

Restarting the notebook instance

When you want to go back to the notebook exercises just select your notebook instance you can select the action Start, wait a few min and pick up where you left off. It will take less time to setup as the fastai libraries have already been installed and the notebooks will be saved.

zmd · March 22, 2020, 9:38pm

Any idea why when running the first lesson and trying to download the segmentation data, I get a ConnectTimeout: HTTPConnectionPool(host=‘files.fast.ai’, port=80): Max retries exceeded with url: /data/examples/camvid_tiny.tgz (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7fcdc07fe898>, ‘Connection to files.fast.ai timed out. (connect timeout=4)’)). Should I be changing something in the Sagemaker permissions?

matt.mcclean · March 22, 2020, 9:42pm

No, it is not a permissions issue. Seems to be an issue with the SageMaker instances in downloading from this HTTP endpoint. It has been raised with the service team and they are looking into it. I got around it by downloading the data files to my laptop then uploading to the notebook instance and use the url prefix “file://”

ganesh.bhat · April 1, 2020, 2:12pm

Setup the environment on AWS.

How do we get the latest updates from git? There is no .git file on the environment that was setup.

Thanks
Ganesh Bhat

matt.mcclean · April 1, 2020, 2:40pm

You can open a terminal and perform a git pull command from the course-v4 directory to get the latest notebooks

ganesh.bhat · April 1, 2020, 3:05pm

Thanks

Salazar · April 4, 2020, 9:57pm

Im getting this error:

The account-level service limit ‘ml.p2.xlarge for notebook instance usage’ is 0 Instances, with current utilization of 0 Instances and a request delta of 1 Instances. Please contact AWS support to request an increase for this limit. (Service: AmazonSageMaker; Status Code: 400; Error Code: ResourceLimitExceeded; Request ID: 5e6a38e1-f8e4-4c28-ba8b-31e8b0d4a382)

Ive tried two different regions and the two different GPUs and get the same error each time.

abharani · April 6, 2020, 3:11am

I am also getting same error.

Salazar · April 6, 2020, 3:28am

Im trying this, I found on github. Still waiting for a response.

From the AWS console go to:

Top right corner of console and click on {your account name} > My Service Quotas
left nav “AWS Services”
Search for “ec2”
Select “Amazon Elastic Compute Cloud (Amazon EC2)”
Search for the size of the ec2 instance you want to raise quote on, eg “m4.large”
Click the result
Click the orange “Request Quota Increase”
In the “Change quota value” box enter the number required and press orange Request button. The request has now been made.
Click “Dashboard” from the left nav

matt.mcclean · April 6, 2020, 8:08am

Correct, you will need to raise a request to AWS support to increase the quota of “p2.xlarge” or “p3.2xlarge” instance types for your account from 0 to 1

Salazar · April 7, 2020, 8:07pm

Im talking to support now and asked how many vCPU’s Id need - I told them 4 because the p2.xlarge looks to be the cheapest - .90 per hour.

How are you planning on using this? Just to train models?

matt.mcclean · April 10, 2020, 10:08am

You need the p2.xlarge if you want to train with a GPU which will be much faster than CPU. It has the Nvidia K80 GPU card. The p3.2xlarge has the newer Nvidia V100 GPU which will be faster to train but is more expensive.

Salazar · April 10, 2020, 3:13pm

looks like they are offering 50 hours of m4.xlarge or m5.xlarge for training for free if youve never used sagemaker before.

christian.acuna · April 12, 2020, 10:19pm

I ran into an exit status 1 error when creating a new notebook instance using the Lifecycle script in the Cloud formation template.
The CloudWatch logs had this error:

Collecting azure
  Downloading azure-5.0.0.zip (4.6 kB)
    ERROR: Command errored out with exit status 1:
     command: /home/ec2-user/SageMaker/.env/fastai2/bin/python -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-gsyv3tl0/azure/setup.py'"'"'; __file__='"'"'/tmp/pip-install-gsyv3tl0/azure/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base /tmp/pip-install-gsyv3tl0/azure/pip-egg-info
         cwd: /tmp/pip-install-gsyv3tl0/azure/
    Complete output (24 lines):
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-install-gsyv3tl0/azure/setup.py", line 60, in <module>
        raise RuntimeError(message)
    RuntimeError:
    
    Starting with v5.0.0, the 'azure' meta-package is deprecated and cannot be installed anymore.
    Please install the service specific packages prefixed by `azure` needed for your application.
    
    The complete list of available packages can be found at:
    https://aka.ms/azsdk/python/all
    
    Here's a non-exhaustive list of common packages:
    
    -  azure-mgmt-compute (https://pypi.python.org/pypi/azure-mgmt-compute) : Management of Virtual Machines, etc.
    -  azure-mgmt-storage (https://pypi.python.org/pypi/azure-mgmt-storage) : Management of storage accounts.
    -  azure-mgmt-resource (https://pypi.python.org/pypi/azure-mgmt-resource) : Generic package about Azure Resource Management (ARM)
    -  azure-keyvault-secrets (https://pypi.python.org/pypi/azure-keyvault-secrets) : Access to secrets in Key Vault
    -  azure-storage-blob (https://pypi.python.org/pypi/azure-storage-blob) : Access to blobs in storage accounts
    
    A more comprehensive discussion of the rationale for this decision can be found in the following issue:
    https://github.com/Azure/azure-sdk-for-python/issues/10646

On April 8, azure deprecated the azure package.

One fix is to update the create notebook script to use 4.0.0 version. I’m not sure where the CloudFormation yaml is stored but this will need to be updated.

pip install nbdev graphviz azure==4.0.0 azure-cognitiveservices-search-imagesearch sagemaker.

matt.mcclean · April 13, 2020, 9:15am

Thanks, I have updated the CFN template with the latest version of the azure package

sabzo · April 19, 2020, 7:37pm

I noticed that the current sagemaker template gives the following error Exception: sentencepiece module is missing: run pip install sentencepiece` even though sentencepiece is installed (it’s in the requirements.txt) and even if I manually install it using pip. Has anyone experienced this yet? Particularly on Lesson 10 (problems seems to be there on fastai2 kernel).

sabzo · April 19, 2020, 7:49pm

nvm – the kernel had to be restarted for changes to take effect

arunslb123 · April 23, 2020, 9:15am

Has anyone tried serving fastai2 models using AWS Lambda and API Gateway? Is it possible to install fastai2 on a lambda layer with layer size restrictions?

ganesh.bhat · April 25, 2020, 1:31pm

I did the git pull but the notebooks don’t have the notes in them. Is there a reason they are devoid of the notes that is available on github?

FraPochetti · April 27, 2020, 12:13pm

https://github.com/fastai/course-v4/tree/master/nbs don’t have notes.
They are the notes-free versions of https://github.com/fastai/fastbook

The reason behind this choice is to let students practice with the raw-code directly, without any additional context.