Hi all,
I have set up a process for myself to use spot instances (p2.xlarge) instead of on demand and it is working well and saving me a lot of money. I set up the instance everytime and then terminating everything. This is my “checklist” (did not spend the time yet to automate it e.g. using aws-cli, but that could be done as well):
EDIT: I will not update this post in case I find some bugs or have any modifications, instead I update (the README.md of) this github repository, where you can find my latest version:
https://github.com/jonas-pettersson/fast-ai
So you can use it if you like.
1) Request Spot Instance
AWS Console -> (Login) -> EC2 Dashboard -> Spot Requests
"Request Spot Instances"
(only changed parameters shown - leave rest as default)
Request type: Request
AMI: Ubuntu Server 16.04 LTS (HVM)
Instance type: p2.xlarge (delete c3.,…)
Set your max price: e.g. 0.3
(Next)
Instance store: attach at launch
EBS volumes / Size: 32 GiB
Security groups: default
(Next / Review)
You may need to change the security group settings if you cannot login to your instance:
AWS Console -> (Login) -> EC2 Dashboard -> Instances
Select instance -> Security Groups -> “default” (or which ever you are using)
Tab “Inbound” -> Edit
Type: SSH
Protocol: TCP
Port Range: 22
Source: 0.0.0.0/0
Type: TCP
Protocol: TCP
Port Range: 8888-8898
Source: 0.0.0.0/0
2) Configure SSH
in cygwin:
cd ~/.ssh emacs config
copy / paste the HostName (Public DNS) of your AWS instance
It can look something like this:
Host aws-p2
HostName ec2-35-166-166-129.us-west-2.compute.amazonaws.com
User ubuntu
IdentityFile “~/.ssh/aws-key.pem”
3) Login
in cygwin:
cd ssh aws-p2
4) Setup AWS Instance
on aws-instance:
git clone https://github.com/jonas-pettersson/fast-ai-courses
(this is my forked copy of https://github.com/fastai/courses/ including my own work)
./fast-ai-courses/setup/install-gpu.sh
sudo apt install python-pip
(pip is not installed by the script)
pip install kaggle-cli
sudo apt-get install unzip
(unzip is not installed by the script)
pip install backports.shutil_get_terminal_size
(otherwise jupyter notebook does not work properly)
5) Setup for Kaggle Competition
on aws-instance:
cd cd fast-ai mkdir data cd data mkdir dogs-cats-redux cd dogs-cats-redux mkdir models
(this is the directory structure I use)
cd tmux kg config -g -u "your_kaggle_username" -p "your_kaggle_password" -c "your_kaggle_competition"
cd ~/fast-ai/data/dogs-cats-redux ~/fast-ai/setup_kg.sh
(this is my own setup script for kaggle, setting up directories, creating
validation set, samle sets etc:
https://github.com/jonas-pettersson/fast-ai/blob/master/setup_kg.sh)
6) Transfer Files
you need to transfer the files you need from your local machine using rsync, e.g.
in cygwin:
rsync -avp --progress dogs-cats-redux-model.h5 aws-p2:~/fast-ai/data/dogs-cats-redux/models
This takes some time unfortunately and is kind of a drawback of the spot-instance approach. Best is to consider carefully what you really need.
7) Start Working
on aws-instance:
cd fast-ai jupyter notebook
8) Save Results
After you’re done you would like to transfer your models back:
in cygwin:
rsync -avp --progress aws-p2:~/dogs-cats-redux/models/dogs-cats-redux-model.h5 .
You will also want to save your notebooks / scripts etc. to GitHub
on aws-instance:
git add ... git commit -m "..." git push origin master
9) Terminate Instance
Make sure you saved everything you need
AWS Console -> (Login) -> EC2 Dashboard -> Spot Requests -> Actions -> cancel spot request + check box "Terminate instances"
Check in EC2 Dashboard -> Instances that your instance is terminated