Salamander: AWS Spot Instances made easy


(Jeremy Howard) #21

I think two suboptimal metrics is better than zero metrics. :slight_smile: Often it’s best to just use your judgement, rather than come up with some perfect algorithm…


(Phani Srikanth) #22

Thanks for your pointers Jeremy. I’ll try adding them soon!


(Ashton Six) #23

@jeremy You mentioned AWS gave fast.ai $250k of compute credits a while ago. We’d need to check it’s ok with Amazon, but we could totally make Salamander 100% free for students & international fellows if you’d like - I’d create a form accessible only to you, which would let you give users free access to Salamander via fast.ai’s compute credits. What do you think?


(Ashton Six) #24

@binga I added pricing to the providers & linked datasheets to the GPUs. What do you think?

PR here:


(Jeremy Howard) #25

We gave out those credits to the last class - but if we can get more credits, that would be very helpful!


(Krisztian Kovacs) #26

I gave Salamander a try, thanks for setting it up! Here are my thoughts so far.

The good:

  • The web interface is super intuitive and it is really easy to get started.
  • The price seems competitive.
  • I really like how it keeps track of the $$ you spent. Very transparent!
  • Minor benefit: the $1 credit lets you try out the system without committing.

The bad:

  • After I set up my machine, I couldn’t start it for a few hours. I forgot the exact message, but it said to tray again in 10 minutes (for quite a while). Maybe the demand at the particular time was high?
  • I couldn’t upload my preexisting ssh key. When I clicked on the upload button (on this page:https://salamander.ai/setup-access-keys), nothing happened. I used Ubuntu 16.04, firefox version 62.0.

Things I don’t understand (I’m probably missing something simple, please let me know):

  • I couldn’t import fastai from a different folder. I tried to modify the .bashrc file adding fastai (using this guide), but that didn’t help. I also tried sys.path.append at the front of my jupiter notebook, pointing to the fastai folder, but that didn’t help either. What am I missing?

(Ashton Six) #27

@Krisztian Thanks for taking a look and saying all those nice things :slight_smile:

There’s been a k80 supply deficit during the last week, the (simplified) algorithm for requesting a server goes:

1. request a server
2. check aws for the request status every 2 seconds
3. give up if the status code indicates capacity issues
4. give up if 15 seconds elapse & aws hasn't agreed to fulfill the request
5. wait 60 seconds for the server to start running, give up & cancel the request if it takes longer
6. lock the server the user tried to start for a random interval of between 1 to 20 minutes

Whilst AWS will often manage to provision a K80 given longer, they often shut down prematurely when you do. I’m still experimenting with the precise algorithm & talking with AWS support. Within the next few weeks I’ll start estimating availability from the rate of failed requests (AWS doesn’t provide that information) and make sure deficits are displayed before you try to launch any servers instead of just trying it over and over. v100 GPUs have been totally fine, very soon you’ll be able to resize storage & change hardware after launching a server which will make it easier to avoid this issue. I think the best use-case for this is helping people switch to low-cost general purpose hardware when they just want to write code or review a notebook for example.

Right now Salamander only requests servers in the cheapest availability zone, I’m considering requesting servers in every availability zone within 10% of the lowest price and cancelling all requests except the first one that completed successfully. This should improve the K80 supply issues and instance startup time.

Did you select the “fastai” kernel after opening the notebook? By default it’s python 3. Within the next few days I’ll change that so any notebooks within the “~/fastai” directory use the “fastai” kernel by default.

I’ll take a look at uploading ssh keys today.

E: the ssh key issue was specific to Firefox and now resolved


(Ashton Six) #29

@Krisztian fyi using sys to control how modules are imported works is much easier than putting your notebooks in a particular location or using environment variables. Put this at the top of each notebook:

%reload_ext autoreload
%autoreload 2
%matplotlib inline

import sys
sys.path.append('/home/ubuntu/fastai')

You’ll then be able to import fastai like this:

from fastai.imports import *
from fastai.torch_imports import *
from fastai.learner import *

(Krisztian Kovacs) #30

Thanks, I fixed the path to ‘/hom/ubuntu/fastai’ and it all worked! (I did have to install some fastai dependencies first though)


(Krisztian Kovacs) #31

My $10 credit purchase just got processed, and I noticed that I also had to pay a $1.75 tax.

That means that the cost comparison is misleading (Paperspace $0.51/hour is inclusive of tax I believe). Including tax, the total cost of the K80 machine is $0.423 (not $0.36) per hour.

That’s still cheaper than Paperspace, but it would be nice to have clarity in terms of pricing.


(Ashton Six) #32

@Krisztian thanks for bringing up the tax. I thought Salamander was charging tax exactly the same way Paperspace & AWS were, but on further investigation I’ve discovered they only hide VAT from European customers like me (link). Right now I’d like to focus entirely on things like service reliability and design issues, but will start to address the legal paperwork (presumably?) needed to charge international customers less tax in a few weeks time.


(Karl) #33

What is the best way to upload large datasets? Is there a way to get the Kaggle API working through Salamander?


(Ashton Six) #34

@KarlH After launching your first server, click “Setup Access Key”. Follow the steps and you’ll soon be able to connect via SSH from your terminal.

screenshot of salamander, showing setup access keys button

Once connected, I recommend installing the official Kaggle API: https://github.com/Kaggle/kaggle-api