Sagemaker Notebook deployment problem - no FastAi kernel

Hi guys!

I have faced further issues with running the Notebook instance after applying the initial solution as described above. Specifically, the notebooks are created as expected, however, after stopping the instance (fastai-v4) and coming back to work and restarting it, I would be getting a Failed (to start) error:

Notebook Instance Lifecycle Config ‘arn:aws:sagemaker:us-east-2:255007536151:notebook-instance-lifecycle-config/fastai-v4lifecycleconfig’ for Notebook Instance ‘arn:aws:sagemaker:us-east-2:255007536151:notebook-instance/fastai-v4’ took longer than 5 minutes. Please check your CloudWatch logs for more details if your Notebook Instance has Internet access.

The error in CloudWatch OnStart log exposed that conda install inconsistency persisted if updates are allowed OnStart:

..
==> WARNING: A newer version of conda exists. <==
  current version: 4.8.4
  latest version: 4.10.1
Please update conda by running

    $ conda update -n base -c defaults conda

After further investigation, I wound up editing the FastaiSageMakerStack template (sagemaker-cfn-course-v4.yml) by removing updating conda section in OnStart script section:

templateURL = https://fastai-cfn.s3.amazonaws.com/sagemaker-cfn-course-v4.yml
stackName = FastaiSageMakerStack

This results in updating conda env only after the initial creation (OnCreate) of the Notebook instance (fastai-v4) and not attempting to update it on subsequent starts (OnStart). While this appears to ‘freeze’ the installed conda env version, it has solved the issue.

Here is the code change applied to OnStart section of the template:

original ver (modified per above posts):

             echo "Updating conda"
              conda update --force-reinstall conda -y
              conda update -n base -c defaults conda -y
              conda update --all -y

updated version (last three lines removed in OnStart section):

             echo "Updating conda - skipped in FastaiSageMakerStack template v6 (sagemaker-cfn-course-v6.yml)"

Below is the updated ver (sagemaker-cfn-course-v4_updated.yml) of the stack template to install the stack from scratch instead of creating the stack from the templates listed here: Amazon SageMaker | Practical Deep Learning for Coders (which results in the error).

To create the stack from scratch, save the template below as sagemaker-cfn-course-v4_updated.yml file. Then delete the failing stack and create it from scratch from this template.

Checking the CloudWatch logs (OnCreate and OnStart) should reflect now errors and successful avoiding of attempting to update conda OnStart

Parameters:
  InstanceType:
    Type: String
    Default: ml.p2.xlarge
    AllowedValues:
      - ml.p3.2xlarge
      - ml.p2.xlarge
    Description: Enter the SageMaker Notebook instance type
  VolumeSize:
    Type: Number
    Default: 50
    Description: Enter the size of the EBS volume attached to the notebook instance
    MaxValue: 17592
    MinValue: 5
Resources:
  Fastai2SagemakerNotebookfastaiv4NotebookRoleA75B4C74:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Statement:
          - Action: sts:AssumeRole
            Effect: Allow
            Principal:
              Service: sagemaker.amazonaws.com
        Version: "2012-10-17"
      ManagedPolicyArns:
        - Fn::Join:
            - ""
            - - "arn:"
              - Ref: AWS::Partition
              - :iam::aws:policy/AmazonSageMakerFullAccess
    Metadata:
      aws:cdk:path: CdkFastaiv2SagemakerNbStack/Fastai2SagemakerNotebook/fastai-v4NotebookRole/Resource
  Fastai2SagemakerNotebookfastaiv4LifecycleConfigD72E2247:
    Type: AWS::SageMaker::NotebookInstanceLifecycleConfig
    Properties:
      NotebookInstanceLifecycleConfigName: fastai-v4LifecycleConfig
      OnCreate:
        - Content:
            Fn::Base64: >-
              #!/bin/bash


              set -e


              echo "Starting on Create script"


              sudo -i -u ec2-user bash <<EOF

              touch /home/ec2-user/SageMaker/.create-notebook

              EOF


              cat > /home/ec2-user/SageMaker/.fastai-install.sh <<\EOF

              #!/bin/bash

              set -e

              echo "Creating dirs and symlinks"

              mkdir -p /home/ec2-user/SageMaker/.cache

              mkdir -p /home/ec2-user/SageMaker/.fastai

              [ ! -L "/home/ec2-user/.cache" ] && ln -s /home/ec2-user/SageMaker/.cache /home/ec2-user/.cache

              [ ! -L "/home/ec2-user/.fastai" ] && ln -s /home/ec2-user/SageMaker/.fastai /home/ec2-user/.fastai


              echo "Updating conda"

              conda update --force-reinstall conda -y
            
              conda update -n base -c defaults conda -y

              conda update --all -y

              echo "Starting conda create command for fastai env"

              conda create -mqyp /home/ec2-user/SageMaker/.env/fastai python=3.6

              echo "Activate fastai conda env"

              conda init bash

              source ~/.bashrc

              conda activate /home/ec2-user/SageMaker/.env/fastai

              echo "Install ipython kernel and widgets"

              conda install ipywidgets ipykernel -y

              echo "Installing fastai lib"

              pip install -r /home/ec2-user/SageMaker/fastbook/requirements.txt

              pip install fastbook sagemaker

              echo "Installing Jupyter kernel for fastai"

              python -m ipykernel install --name 'fastai' --user

              echo "Finished installing fastai conda env"

              echo "Install Jupyter nbextensions"

              conda activate JupyterSystemEnv

              pip install jupyter_contrib_nbextensions

              jupyter contrib nbextensions install --user

              echo "Restarting jupyter notebook server"

              pkill -f jupyter-notebook

              rm /home/ec2-user/SageMaker/.create-notebook

              echo "Exiting install script"

              EOF


              chown ec2-user:ec2-user /home/ec2-user/SageMaker/.fastai-install.sh

              chmod 755 /home/ec2-user/SageMaker/.fastai-install.sh


              sudo -i -u ec2-user bash <<EOF

              nohup /home/ec2-user/SageMaker/.fastai-install.sh &

              EOF


              echo "Finishing on Create script"
      OnStart:
        - Content:
            Fn::Base64: >-
              #!/bin/bash


              set -e


              echo "Starting on Start script"


              sudo -i -u ec2-user bash << EOF

              if [[ -f /home/ec2-user/SageMaker/.create-notebook ]]; then
                  echo "Skipping as currently installing conda env"
              else
                  # create symlinks to EBS volume
                  echo "Creating symlinks"
                  ln -s /home/ec2-user/SageMaker/.fastai /home/ec2-user/.fastai
                  echo "Updating conda - skipped in FastaiSageMakerStack template v4_updated (sagemaker-cfn-course-v4_updated.yml)"
                  echo "Activate fastai conda env"
                  conda init bash
                  source ~/.bashrc
                  conda activate /home/ec2-user/SageMaker/.env/fastai
                  echo "Updating fastai packages"
                  pip install fastai fastcore sagemaker --upgrade
                  echo "Installing Jupyter kernel"
                  python -m ipykernel install --name 'fastai' --user
                  echo "Install Jupyter nbextensions"
                  conda activate JupyterSystemEnv
                  pip install jupyter_contrib_nbextensions
                  jupyter contrib nbextensions install --user
                  echo "Restarting jupyter notebook server"
                  pkill -f jupyter-notebook
                  echo "Finished setting up Jupyter kernel"
              fi

              EOF


              echo "Finishing on Start script"
    Metadata:
      aws:cdk:path: CdkFastaiv2SagemakerNbStack/Fastai2SagemakerNotebook/fastai-v4LifecycleConfig
  Fastai2SagemakerNotebookfastaiv4NotebookInstance7C46E7E0:
    Type: AWS::SageMaker::NotebookInstance
    Properties:
      InstanceType:
        Ref: InstanceType
      RoleArn:
        Fn::GetAtt:
          - Fastai2SagemakerNotebookfastaiv4NotebookRoleA75B4C74
          - Arn
      DefaultCodeRepository: https://github.com/fastai/fastbook
      LifecycleConfigName: fastai-v4LifecycleConfig
      NotebookInstanceName: fastai-v4
      VolumeSizeInGB:
        Ref: VolumeSize
    Metadata:
      aws:cdk:path: CdkFastaiv2SagemakerNbStack/Fastai2SagemakerNotebook/fastai-v4NotebookInstance
  CDKMetadata:
    Type: AWS::CDK::Metadata
    Properties:
      Modules: aws-cdk=1.60.0,@aws-cdk/aws-iam=1.60.0,@aws-cdk/aws-sagemaker=1.60.0,@aws-cdk/cloud-assembly-schema=1.60.0,@aws-cdk/core=1.60.0,@aws-cdk/cx-api=1.60.0,@aws-cdk/region-info=1.60.0,jsii-runtime=node.js/v14.8.0
    Condition: CDKMetadataAvailable
Conditions:
  CDKMetadataAvailable:
    Fn::Or:
      - Fn::Or:
          - Fn::Equals:
              - Ref: AWS::Region
              - ap-east-1
          - Fn::Equals:
              - Ref: AWS::Region
              - ap-northeast-1
          - Fn::Equals:
              - Ref: AWS::Region
              - ap-northeast-2
          - Fn::Equals:
              - Ref: AWS::Region
              - ap-south-1
          - Fn::Equals:
              - Ref: AWS::Region
              - ap-southeast-1
          - Fn::Equals:
              - Ref: AWS::Region
              - ap-southeast-2
          - Fn::Equals:
              - Ref: AWS::Region
              - ca-central-1
          - Fn::Equals:
              - Ref: AWS::Region
              - cn-north-1
          - Fn::Equals:
              - Ref: AWS::Region
              - cn-northwest-1
          - Fn::Equals:
              - Ref: AWS::Region
              - eu-central-1
      - Fn::Or:
          - Fn::Equals:
              - Ref: AWS::Region
              - eu-north-1
          - Fn::Equals:
              - Ref: AWS::Region
              - eu-west-1
          - Fn::Equals:
              - Ref: AWS::Region
              - eu-west-2
          - Fn::Equals:
              - Ref: AWS::Region
              - eu-west-3
          - Fn::Equals:
              - Ref: AWS::Region
              - me-south-1
          - Fn::Equals:
              - Ref: AWS::Region
              - sa-east-1
          - Fn::Equals:
              - Ref: AWS::Region
              - us-east-1
          - Fn::Equals:
              - Ref: AWS::Region
              - us-east-2
          - Fn::Equals:
              - Ref: AWS::Region
              - us-west-1
          - Fn::Equals:
              - Ref: AWS::Region
              - us-west-2

I hope this helps.

Best,
PO