Automated setup for custom AWS AMI generation (based on plain Ubuntu Xenial)

suvash · March 15, 2018, 1:14pm

As a warm up for Part 2, I thought it would be a good idea to automate building a custom AMI with CUDA/cuDNN and Miniconda (based on plain Ubuntu Xenial), fully scripted so that it could be run non-interactively on the command line, build servers etc. So, here’s my gift to all the chatty fast.ai friends this batch.

The main idea was

to learn what is involved in getting from a plain Ubuntu -> GPU ready Ubuntu
to version control all aspects of the AMI generation, so it can be easily updated as needed by changing a few lines in the included scripts.

Packer(https://www.packer.io/intro/index.html) is a well established tool for building AMIs and such, and that’s what I’ve used.

I’ve included the required files in a Gist here( https://gist.github.com/suvash/c76814f9149d5a1ca4133d3bfbf58c48 ). (Just run make help after getting all the files in a directory)
The setup creates an AMI with the following:

Ubuntu Xenial 16.04
CUDA + cuDNN
Miniconda (Because Anaconda is quite big and probably unnecessary.)

Some things to know BEFORE YOU RUN make build:

Beware that Packer will spin up a p2.xlarge instance to build the image, and shut it down afterwards. (even though it might seem as it running locally.)
A non-default VPC will be required to run the p2.xlarge instance, Packer doesn’t handle this, so you’ll have to create it before hand.
A subnet on the VPC (with internet gateway) will be required, this will also have to be prepare beforehand.

(I bootstrap AWS infrastructure using Terraform, and that’s how I created the VPC+subnet above, but that’s for another post.)

When all is done, the AMI will be created on the region you’ve specified. Beware that since this is stored as an AMI entry + EBS volume snapshot on your account, you’ll have to continue paying (tiny amount) for it if you chose to keep it around.

And finally, I can confirm that the AMI generated is Lesson 1: Dogs and cats compatible. Hoping this is useful to more.

Looking forward to the part 2 sessions.

suvash · March 15, 2018, 1:20pm

( Didn’t want to ping you @jeremy, but I’ve included the cudnn-..tgz hosted on http://files.fast.ai in the public gist. I’m hoping that’s alright, but still double checking. )

vikasbahirwani · March 15, 2018, 2:56pm

Do we need AWS for Part 2 or is it sufficient to have Paperspace? How about my own machine ? I have a 4G GTX 970 would that be enough?

suvash · March 15, 2018, 3:51pm

As long as you have a CUDA/cuDNN compatible GPU for the pytorch version used by fastai library, you should be fine.
I can confirm that both AWS GPU machines and Paperspace GPU machines are compatible in that regard.
I have no idea about the GTX GPU you’ve mentioned. If you want to use it over the AWS/Paperspace solutions, be prepared to do some research and testing on your own.

And when it comes to GPU memory, more is always better, as you can have larger mini-batches. 4G is still okay, but do expect longer training times.

vikasbahirwani · March 15, 2018, 9:43pm

Thank you. Noted.