I just got back from vacation and tried to start my notebook instance after approximately 13 days (vacation) and its failing. I do not see any error message in Cloudwatch logs for the Notebook instance’s LifeCycleConfigOnStart.
The message next to “Failed” in notebook instances dashboard says the following - " Notebook Instance Lifecycle config << config name >> for the notebook instance << instance name>> took longer than 5 minutes. Please check your CloudWatch log for more details if your Notebook instance has internet access. "
In cloudwatch logs, the last action is Updating fastai library
Is this because start script is taking more than 5 minutes?
In CloudWatch log I see the script is trying to download many things with status -Requirement already satisfied. Is there a way to change anything in start script so that it takes less than 5 minutes?
Did I lose my work?
Like you had mentioned earlier, If I follow the below step, will I be able to recover the work I did so far (I don’t want to lose it)
I removed the start script from the notebook configuration and was able to open the notebook instance. I then zipped the folder containing my work and downloaded it! Happy that I was able to save my work Thanks @matt.mcclean for suggesting that approach.
Hi, I can’t follow the instructions for SageMaker setup, as documented here: https://course.fast.ai/start_sagemaker.html . Where does one go exactly in CloudFormation console to click on the Launch Stack and then Create Stack windows, i.e. the first two pictures shown in the document? Sorry if I’ve missed something simple. Thanks.
@matt.mcclean Do you have the video link for your re-invent talk? I check the example notebook but it is a bit hard to follow with content. I did find one video from AWS summit https://www.youtube.com/watch?v=1kJf0Lvzj8A But I think that is a bit old and I also cannot find the code
I include steps and questions below. Hope you can provide some insights. And feel free to point out anything I missed.
upload model to S3: but I am not clear on what needs to include? Sounds like we need data and learner?
create/deploy docker image for inference to ECR: Do you have Dockerfile example?
setup endpoint: No question so far
scale up by using Amazon SageMaker Elastic Inference: Not so sure there is an example for this. Since this is pretty new.
jupyter labextension install jupyter-matplotlib
(also need to refresh page and restart kernel)
I’m able to install these after the fact (seems to be every time I start/stop my notebook), but attempting to add these steps to a copy of the fastai lifecycle config script has been failing for me. The widgets are required for some fastai notebook applications, and don’t seem to work by default with jupyterlab. For example:
%matplotlib widgets
import ipywidgets as widgets
from ipywidgets import interact
def f(x):
return x
interact(f, x=10);
I just created a new stack and everything is fine, however in lesson 1, the cell with learn.fit_one_cycle(4) is taking 15+ minutes to complete each epoch. In the video this only took 2 minutes for all 4 epochs. I used the “ml.p2.xlarge” instance type, is this instance really that much slower or is something wrong with my set up?
You can install custom libraries and jupyter extensions in your environment by adding install or config commands to the script located here: /home/ec2-user/SageMaker/custom-start-script.sh.
The default behaviour of this script is to update the fastai library and course notebooks.
For example I have enabled the jupyter collapsible headings extension with the following command in the script file:
The steps are a lot easier now that the fastai library is bundled into the pre-built PyTorch container for SageMaker. I have setup a project here that shows how to train and deploy the lesson 1 example (Pets) with SageMaker.