Hi all,
I’m fairly new to AWS, and I’m not a programmer by trade. I initially followed the getting started instructions for creating an EC2 instance using the 128GB ami, but being cost conscious, I wanted to try to find ways to optimize and keep on the free AWS tier for the first 12 months. I thought I would write out the rough steps I did, in the event it is helpful to any other newbies.
Getting going from “scratch”
First things first, it is easy to create a smaller EC2 instance and get it running. Thanks to the install-gpu.sh shell script, much of the install is taken care of for you. Like I mentioned, first I went with the standard 128GB ami provided by Jeremy and Rachel. Then I found a great thread on reducing size of volume from vshets. His ami used the full 30GB of free-tier-eligible EBS, and didn’t really provide me a practical way to manage things.
So, using the aws portal, I created a new, slightly smaller EC2 instance. For this pass, I went with 24GB. I have some thoughts on lowering that, which I’ll get to shortly. After logging in, creating my desired directory structure, I ran the install-gpu.sh script.
I found this generally worked, but it didn’t add the nvcc path to my ~/.bashrc file, so I had to manually add it. Adding the following two lines to my ~/.bashrc file did the trick.
export PATH=/usr/local/cuda-8.0/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-8.0/lib64:$LD_LIBRARY_PATH
Oh, and for whatever reason, jupyter notebook wasn’t recognized. So I added this line to ~/.bashrc
export PATH=“/home/ubuntu/anaconda2/bin:$PATH”
I also had to downgrade keras to version 1.2.2, as mentioned in another thread.
sudo pip install keras==1.2.2
At this point, things worked and I was able to get through lesson 1. I’m currently running the first full pass at the State Farm distracted driving kaggle contest.
Next Steps
I think I want to shake things up again, create an EC2 instance with an even smaller EBS volume, just for the OS and applications. Then I can mount a secondary EBS volume as a data store. This will allow me to keep a “pure” OS volume, which I can back up as an ami, and get up and running with all my applications and settings exactly how I like them. By mounting a second EBS volume as a data directory, I can keep the big data hogs (like the 4+ GB State Farm data set) outside of my ami. If I want to back that up for any reason, I can create an EBS snapshot, but since bandwidth is relatively cheap, and you can keep a record of the commands you ran to post process a data set, I’m thinking I might just treat them as fairly disposable.
One reason this data volume vs OS volume approach is interesting to me is that AWS charges based on GB-Months of EBS provision, and I believe that is based on the full provision size, and not just the size used. So having a 128GB volume provisioned indefinitely would be much more expensive than having a small OS volume provisioned and adding on a data volume as needed for that week’s course. Then you could either save those as snapshots and delete the volumes, or just keep them and replace the data with the next project.
If I end up going this route of making an even smaller OS+apps ami, I’ll follow up to this post with more info on my approach and how it’s working for me. If you have done anything similar, please give me feedback.
Thanks,
Zach