Platform: SageMaker ✅

jerbly · December 16, 2018, 1:15am

I fixed this by making my own sagemaker-create in an s3 bucket and removing the -nightlys from the conda create line like so:

conda create -mqyp envs/fastai -c pytorch -c fastai python=3.6 jupyter pytorch fastai cuda92 torchvision

Then edit the create script in the lifecycle:

aws s3 cp s3://your-bucketname-here/sagemaker-create .
#wget -N https://course-v3.fast.ai/setup/sagemaker-create

Update the IAM role to include S3FullAccess to enable the aws s3 call above.

Final issue I faced was that fastai 1.0.37 is dependent on dataclasses which I had to pip install.

Now I have a working SageMaker notebook again!

amallya · December 28, 2018, 1:29am

Hi Matt,

I just got back from vacation and tried to start my notebook instance after approximately 13 days (vacation) and its failing. I do not see any error message in Cloudwatch logs for the Notebook instance’s LifeCycleConfigOnStart.
The message next to “Failed” in notebook instances dashboard says the following - " Notebook Instance Lifecycle config << config name >> for the notebook instance << instance name>> took longer than 5 minutes. Please check your CloudWatch log for more details if your Notebook instance has internet access. "
In cloudwatch logs, the last action is Updating fastai library

Is this because start script is taking more than 5 minutes?
In CloudWatch log I see the script is trying to download many things with status -Requirement already satisfied. Is there a way to change anything in start script so that it takes less than 5 minutes?
Did I lose my work?
Like you had mentioned earlier, If I follow the below step, will I be able to recover the work I did so far (I don’t want to lose it)

Is there any new way to recover the work?
Note - I had followed https://course-v3.fast.ai/start_sagemaker.html#getting-set-up to set up.

I don’t want to lose the work I have done so far I would appreciate any help.

amallya · January 2, 2019, 6:48pm

I removed the start script from the notebook configuration and was able to open the notebook instance. I then zipped the folder containing my work and downloaded it! Happy that I was able to save my work Thanks @matt.mcclean for suggesting that approach.

gnchen · January 26, 2019, 10:35pm

Turns out I missed the clickable links on the setup. Though it was pic. Sorry. Move on now

mooresjx · January 31, 2019, 8:28pm

Hi, I can’t follow the instructions for SageMaker setup, as documented here: https://course.fast.ai/start_sagemaker.html . Where does one go exactly in CloudFormation console to click on the Launch Stack and then Create Stack windows, i.e. the first two pictures shown in the document? Sorry if I’ve missed something simple. Thanks.

gnchen · February 1, 2019, 3:35pm

@mooresjx Looks like you missed the same thing as I did. Launch stack on https://course.fast.ai/start_sagemaker.html is a clickable links

gnchen · February 1, 2019, 3:37pm

@matt.mcclean or anyone has a how to guide to deploy models created in fastai to sagemaker, then customize docker to run inference code?

matt.mcclean · February 1, 2019, 4:33pm

I have been meaning to write up a guide to deploy your trained fastai model as a SageMaker endpoint.

I did a talk at AWS re:Invent last year with Andrew Shaw showing how to do this. You can checkout an example notebook here: https://github.com/mattmcclean/sagemaker-fastai-example/blob/master/chalk_talk_demo.ipynb

gnchen · February 1, 2019, 4:56pm

Thanks @matt.mcclean Will take a look first

mooresjx · February 1, 2019, 5:02pm

THANKS!! That was so not obvious :))

gnchen · February 3, 2019, 5:47pm

@matt.mcclean Do you have the video link for your re-invent talk? I check the example notebook but it is a bit hard to follow with content. I did find one video from AWS summit https://www.youtube.com/watch?v=1kJf0Lvzj8A But I think that is a bit old and I also cannot find the code

I include steps and questions below. Hope you can provide some insights. And feel free to point out anything I missed.

upload model to S3: but I am not clear on what needs to include? Sounds like we need data and learner?
create/deploy docker image for inference to ECR: Do you have Dockerfile example?
setup endpoint: No question so far
scale up by using Amazon SageMaker Elastic Inference: Not so sure there is an example for this. Since this is pretty new.

Gen

ashhimself · February 18, 2019, 11:56am

Would like to see this talk as well if anyone knows the link as I had a quick look around as well! @matt.mcclean

jerbly · February 18, 2019, 1:38pm

I was at that talk at re:invent - this was a “chalk talk”, sadly they’re not recorded.

ashhimself · February 18, 2019, 1:54pm

Dang, thanks for letting me know @jerbly

austinmw · February 18, 2019, 10:19pm

@matt.mcclean Hi, could this lifecycle configuration be updated with the necessary components to get Matplotlib widgets working in jupyterlab?

It looks like the environment also needs:

pip install ipywidgets
pip install ipympl
jupyter labextension install @jupyter-widgets/jupyterlab-manager
jupyter labextension install jupyter-matplotlib
(also need to refresh page and restart kernel)

I’m able to install these after the fact (seems to be every time I start/stop my notebook), but attempting to add these steps to a copy of the fastai lifecycle config script has been failing for me. The widgets are required for some fastai notebook applications, and don’t seem to work by default with jupyterlab. For example:

%matplotlib widgets
import ipywidgets as widgets
from ipywidgets import interact

def f(x):

return x

interact(f, x=10);

sublimemm · February 20, 2019, 4:59am

I just created a new stack and everything is fine, however in lesson 1, the cell with learn.fit_one_cycle(4) is taking 15+ minutes to complete each epoch. In the video this only took 2 minutes for all 4 epochs. I used the “ml.p2.xlarge” instance type, is this instance really that much slower or is something wrong with my set up?

matt.mcclean · February 20, 2019, 4:42pm

You can install custom libraries and jupyter extensions in your environment by adding install or config commands to the script located here: /home/ec2-user/SageMaker/custom-start-script.sh.

The default behaviour of this script is to update the fastai library and course notebooks.

For example I have enabled the jupyter collapsible headings extension with the following command in the script file:

jupyter nbextension enable collapsible_headings/main

matt.mcclean · February 20, 2019, 4:42pm

Sorry it was not recorded as it was a Chalk Talk and not a breakout session

matt.mcclean · February 20, 2019, 6:04pm

The steps are a lot easier now that the fastai library is bundled into the pre-built PyTorch container for SageMaker. I have setup a project here that shows how to train and deploy the lesson 1 example (Pets) with SageMaker.

sublimemm · February 20, 2019, 7:36pm

I decided to try googling how to verify if my gpu was being used… I’m getting this using the fastai env