Thoughts on spot instances

Hi all,

I have set up a process for myself to use spot instances (p2.xlarge) instead of on demand and it is working well and saving me a lot of money. I set up the instance everytime and then terminating everything. This is my “checklist” (did not spend the time yet to automate it e.g. using aws-cli, but that could be done as well):

EDIT: I will not update this post in case I find some bugs or have any modifications, instead I update (the README.md of) this github repository, where you can find my latest version:
https://github.com/jonas-pettersson/fast-ai
So you can use it if you like.

1) Request Spot Instance
AWS Console -> (Login) -> EC2 Dashboard -> Spot Requests
"Request Spot Instances"

(only changed parameters shown - leave rest as default)
Request type: Request
AMI: Ubuntu Server 16.04 LTS (HVM)
Instance type: p2.xlarge (delete c3.,…)
Set your max price: e.g. 0.3
(Next)

Instance store: attach at launch
EBS volumes / Size: 32 GiB
Security groups: default
(Next / Review)

You may need to change the security group settings if you cannot login to your instance:
AWS Console -> (Login) -> EC2 Dashboard -> Instances
Select instance -> Security Groups -> “default” (or which ever you are using)
Tab “Inbound” -> Edit

Type: SSH
Protocol: TCP
Port Range: 22
Source: 0.0.0.0/0

Type: TCP
Protocol: TCP
Port Range: 8888-8898
Source: 0.0.0.0/0

2) Configure SSH
in cygwin:
cd ~/.ssh emacs config
copy / paste the HostName (Public DNS) of your AWS instance
It can look something like this:
Host aws-p2
HostName ec2-35-166-166-129.us-west-2.compute.amazonaws.com
User ubuntu
IdentityFile “~/.ssh/aws-key.pem”

3) Login
in cygwin:
cd ssh aws-p2

4) Setup AWS Instance
on aws-instance:
git clone https://github.com/jonas-pettersson/fast-ai-courses
(this is my forked copy of https://github.com/fastai/courses/ including my own work)
./fast-ai-courses/setup/install-gpu.sh

sudo apt install python-pip
(pip is not installed by the script)
pip install kaggle-cli

sudo apt-get install unzip
(unzip is not installed by the script)

pip install backports.shutil_get_terminal_size
(otherwise jupyter notebook does not work properly)

5) Setup for Kaggle Competition
on aws-instance:
cd cd fast-ai mkdir data cd data mkdir dogs-cats-redux cd dogs-cats-redux mkdir models
(this is the directory structure I use)

cd tmux kg config -g -u "your_kaggle_username" -p "your_kaggle_password" -c "your_kaggle_competition"

cd ~/fast-ai/data/dogs-cats-redux ~/fast-ai/setup_kg.sh
(this is my own setup script for kaggle, setting up directories, creating
validation set, samle sets etc:
https://github.com/jonas-pettersson/fast-ai/blob/master/setup_kg.sh)

6) Transfer Files
you need to transfer the files you need from your local machine using rsync, e.g.
in cygwin:
rsync -avp --progress dogs-cats-redux-model.h5 aws-p2:~/fast-ai/data/dogs-cats-redux/models
This takes some time unfortunately and is kind of a drawback of the spot-instance approach. Best is to consider carefully what you really need.

7) Start Working
on aws-instance:
cd fast-ai jupyter notebook

8) Save Results
After you’re done you would like to transfer your models back:
in cygwin:
rsync -avp --progress aws-p2:~/dogs-cats-redux/models/dogs-cats-redux-model.h5 .
You will also want to save your notebooks / scripts etc. to GitHub
on aws-instance:
git add ... git commit -m "..." git push origin master

9) Terminate Instance
Make sure you saved everything you need
AWS Console -> (Login) -> EC2 Dashboard -> Spot Requests -> Actions -> cancel spot request + check box "Terminate instances"
Check in EC2 Dashboard -> Instances that your instance is terminated

9 Likes

After terminating the instance will all data be lost? If so maybe a better solution is to have a detachable volume that can be attached to the spot instance and detached after termination. I think there should be a way to automate the process using a bash script but my skills are not good enough to be able to do that :slight_smile:

Edit: I think it is possible to do all the steps required to request and set up a spot instance automatically in a bash script. Is there anyone here who is proficient enough at bash scripting who can give some advice?

yes, all data is lost after termination. problem with the detachable volume (I tried that too) is that you pay for it even when it is not used. if you relly want to get down to $0 / hour when not using AWS, then the termination seems to be the only way. in all other cases my bill kept increasing (even with small amount)

Thanks for sharing this other approach, I wonder how much money (in total, roughly) did you spend using this method to complete the course? Thanks in advance.

For those of you using spot instances, EFS is a good way to manage persistent data. This way you can store all your data on an EFS directory and mount it on any spot (or on-demand) instance at startup: http://docs.aws.amazon.com/efs/latest/ug/mount-fs-auto-mount-onreboot.html

3 Likes

Sorry for late response (was on vacation). In the beginning, when using the approach described in the course (based on AMI, not using spot instances) I had quite a high bill with several dollars per day. End of January I had a bill of about 40 dollars even though I was careful stopping the instance when I did not use it.
Using the approach with the spot instance my expenses are minimal. I pay usually less then a dollar for working a few hours on time. My current bill for February is just 2 dollars.
One problem with the AMI provided in the course is that it had 128 GiB of storage volume, for which you pay also when the instance is stopped. There is unfortunately no easy way to reduce the storage volume for an existing instance (this topic was subject of other threads in this forum) and so I was stuck with that. For the spot instance I use 32 GiB and it is sufficient for all I need.

1 Like

Isn’t EFS a lot more expensive than EBS (up to 10x)? And I wasn’t sure, but thought there might be speed tradeoffs too from reading the docs. Main advantage seems to be that you can use the same storage for multiple machines at the same time (which you can’t do with EBS).

I’ve been using an extra EBS volume which I attach to whichever instance I’m running. You can also set your machine to automount the EBS volume.

EBS is definitely preferably if you don’t need multi-instance access (and can ensure your spot instances always start in the same availability zone).

Instead of rsyncing, it might be worth saving the data onto s3 instead. The free tier allows to store 5Gb for a year, and 1 Gb Inbound + 1Gb Outbound data transfer are free per region per month. Additional storage is like $0.34/10Gb/month, definitely worth it for me considering how long it takes to run rsync everytime.

I am too using spot instances. I used not the image used in course but rather Deep Learning AMI provided by amazon (which has all the software and EBS of 50 Gb) and terraform (~dsl to create aws infrastructure) + bash (user data script) to bid and provision spot instances. I use git and s3 (aws sync which is very fast) to store results. Average price for hour for p2.xlarge is around 20 cents an hour plus negligible s3 and EBS costs. I plan to describe my workflow in a blog post but feel free to ask if you’re interested.

2 Likes

Sharing it would be great, thanks in advance! More in general, great advice from you all about spot instances over here. About $2k are needed today to build a proper rig at home or get an equivalent laptop, I’m on the fence. Thanks again.

Hi,

I am thinking about the setup when I do the main work on a really cheap on-demand EC2 instance.
The scripts and the data are stored in a ~20GB ebs, the instance itself should have the minimum required config. During the development, one runs the notebooks only on small samples of data, and when everything seems working, just spin-off a spot p2 instance, attach the volume and run on the whole data. Has anyone tried this approach ?

So the bill should be really small, as the spot instance is running only to perform the final computations, while during the development phase you basically pay only for the ebs and IPs (if you run the rest on a free tier or smth very cheap)

Yes I have used that and it works perfectly fine. The same EBS instance can be attached to a GPU instance as well as a regular CPU one (though dont go into the very small para virtual instances as they use different architecture). Or now you can use Crestle as it lets you switch CPU and GPU.

1 Like

I’ve developed a tool to help automate launching and destroying / backing up spot instances. It’s called spotr:

Let me know what you think.