Vast.ai: Easy Docker-based Peer GPU Rental (Training costs 3x to 5x less)

lahwran · August 17, 2018, 1:39am

Deep learning uses lots of compute; we (@jcannell and I) created Vast.ai to help lower the cost of all that compute. Here’s a 30 second demo:

We’ve aggregated together hardware, performance, and cost data for both current cloud offerings and peer rentals into one place to make it easier to compare and rank. If you’re interested in reducing your training costs you can rent a consumer machine and get about 3 to 5 times more performance per dollar.

Edit: The fastai $20 credit promotion all been claimed. New users still receive credit sufficient 5 to 10 hours of GPU time, aka $1.

We support the fast.ai docker image provided by paperspace - just select it from the list when renting an instance. (We also fixed a minor issue with the base container that prevented lesson 6 from working - let us know if you find any others!)

To back up the 3x to 5x lower cost claim, we can use an example from Stanford’s Dawnbench CIFAR10 competition. The winning entry uses a single V100 to train in 6:45 for a cost of $0.26 on Paperspace. Using the same code as the 2nd place entry (bkj), you can train in ~13 minutes on a single 1080Ti, for a cost of ~$0.05 on our service.

To reproduce that result, select the pytorch/pytorch image, connect to the jupyter notebook, open a terminal (new => terminal), and then run the following commands in the container:

apt install git; git clone https://github.com/bkj/basenet.git; cd basenet; git checkout 49b2b61; pip install -r requirements.txt; python setup.py install; cd examples; python cifar10.py --download

And finally, if you happen to have a powerful deep learning rig that you aren’t using very much, you could use our service to rent it out.

Feedback much appreciated. We are also available on our discord server.

jeremy · August 17, 2018, 4:21am

Thanks for taking the time to make sure fast.ai works well, and to provide a nice reproducible performance example. The service looks pretty nice based on my quick scan! Looking forward to hearing how folks go with this.

One suggestion: please consider highlighting that people shouldn’t consider their data safe/secure on this service. Your current coverage of this issue is buried deep in the FAQ, and isn’t as clear as it could be. Instead, all new users should be made aware of this when they set up their first instance, because it’s important and (for many) unexpected.

Jifu · August 17, 2018, 1:37pm

Has anyone tested hosting an instance on Ubuntu 18.04?

Thanks in advance!

SKS · August 17, 2018, 2:51pm

Hi @lahwran,

I’m sorry, but I’m very new to the cloud.

Please let me know if I did it correctly or not.

I went to create then select image where I selected paperspace/fastai as I don’t know much about the requirements that i need to run the DL/ML notebook so I sort it by Price(Asc) and selected (RENT) the first one.

Then I went to instances and started the instance. It’s really easy just like running jupyter notebook in my local. But Now I have stopped the instance but still, Lifetime is still running. Do I have to destroy it?

I’m guessing the billing depends on the Lifetime

Now, if I destroy it any changes made to the notebook will stay?
and if I want to get the notebook to my computer what should I do?

Regards,
Sumit

jcannell · August 17, 2018, 2:57pm

@Jifu - We don’t support Ubuntu 18.04 yet for hosting. We have had a couple hosts try it anyway and run into various problems. We will of course support it in the future at some point.

Jifu · August 17, 2018, 3:14pm

Thanks for your reply! As soon as support for Ubuntu 18.04 is released, I’m looking forward to try your service.

Good luck!

lahwran · August 17, 2018, 5:48pm

@SKS If you stop an instance, you’re not billed for the GPU time - billing is based on time spent running, not time spent existing. There is a small charge to keep an instance’s data around, but almost all of the instances’ cost is billed only when running. If you destroy an instance, any data on it will be lost. You can download your notebook using jupyter’s download button:

@jeremy That’s a pretty reasonable concern! We’d like to be transparent by design rather than throwing up a dialog box or etc. We’re thinking about how to best display levels of provider verification and security on the cards.

(To be honest, I feel like just the fact that we share that we’re doing peer rentals at all is already much more transparent than competitors who are using peer providers but obfuscating it.)

Also, the fastai promo code is on its way to running out. Just a heads up! If you have trouble applying it, that might be why, but reply here, email us, or poke us on discord if you have any issues.

SKS · August 18, 2018, 3:53am

Thanks, @lahwran

teeyare · August 18, 2018, 9:25am

It looks good! Pretty smooth
Access with Jupyter notebooks through the browser works fine but when I tried ssh I was asked a password. I checked my email to see if I received it but no

jcannell · August 18, 2018, 9:12pm

@teeyare - So it shouldn’t ask for a password. If it does, that typically means your ssh key wasn’t set properly. We could use better ssh key checking to prevent this - but right now if there is any typo in the ssh key you have set, then ssh connect will ask for a password.

clipmaker · August 25, 2018, 5:46am

Is there some more info around about where these machines are and how to use them? I.e. are they typically people’s home machines with GPU’s connected through home internet? Is it possible to ssh into them, maybe through some kind of reverse proxy on the GPU machine that can only reach some whitelisted addresses? Is it possible to create and run new docker images, maybe after submitting them somewhere and getting them approved? Thanks.

jcannell · August 26, 2018, 12:19am

@clipmaker - Yes, you can ssh into the machines, use scp, etc. Most of the machines are behind firewalls but proxy deals with that. Our GUI asks for your SSH public key before you rent a machine and sends it over for you. At one point we did allow custom docker images but that was removed at some point due to some issues, we will probably renable it in the near future. What docker image did you have in mind?

clipmaker · August 27, 2018, 11:50pm

@jcannell Thanks! Do the machines have reasonable network connectivity for transferring data sets? I’m used to machines in data centers so if these are on home or office internet I’m not sure what to expect.

Re Docker: I was thinking it would be cool to have a image for Leela Chess Zero, lczero.org. Any interest?

lahwran · August 28, 2018, 3:36am

Network benchmarks are listed on each rental offer’s card. Many machines have pretty dang fast inet. We’ll allow creation from custom images in not too terribly long, which should allow that.

clipmaker · August 28, 2018, 9:21am

@lahwran, thanks, I’ll check there, though the rental cards are totally unreadable in my browser (Firefox 61 Linux). See screen shot. Can you make a simple minimal version of the web site with just the info and no styling? That’s almost always better. Screen shot:

clipmaker · August 29, 2018, 6:07am

Ok, I made an account, haven’t spun up any machines yet but will do so. Beta test comments:

I have not found any way to view the machine specs legibly in firefox on my laptop (see messed up screen shot above). The only way I can read them is on my phone (Firefox Klar). But then I have to scroll around because the fancy web framework doesn’t take the mobile screen into account (I thought that was the whole point of those frameworks).
I see an available balance that fluctuates randomly between $1.00 and $1.10.
No 2-factor authentication, not good for a service that allows running up enormous charges of real money
No easy way to retrieve the machine list as html (i.e. it’s populated by some JS bloat so curl doesn’t retrieve it). Really it would be great if you could just expose the JSON endpoint so I can retrieve the JSON and bypass the web front end. You can and should add a real API later but it would be a big help just being able to get the machine list without resorting to my phone
It’s disappointing that the $20 initial credit was limited-time when the announcement didn’t say this. I missed out on it by not signing up early. It’s understandable but could have been better communicated.
It’s not clear whether the server offers are basically dedicated servers. I.e. whether all the resources (cpu, ram) of the underlying host are available. There are some powerful machines listed at surprisingly low prices (even ignoring the GPU), which makes me wonder whether they are for real.
It would be nice to be able to pre-fund the account from a credit card (e.g. charge $10 to the card and add it to the account balance) instead of invoicing. If balance runs out, suspend service. That gets rid of the possibility of running up a huge invoice if (say) a network interruption makes it impossible to shut down a machine. And I’d just rather not leave payment credentials on file. I use prepaid mobile phones for the same reason: less ways to get clobbered by billing mixups.
(added) For some reason when I got the announcement of vast.ai in the forum update email, I thought vast.ai was a spinoff project of fast.ai. It looks like it’s not affiliated, which is fine, but there is some possibility of confusion.
(added) (not a criticism just an observation) It looks to me like I’m seeing an upward trend in hourly rental prices for GTX cards over the past few days. I guess that’s good for you since it means people are finding out about the service so demand is building up. Is it the case though that console shows all the available servers, i.e. there are actually only a dozen or so available? If yes, you’ll obviously need more soon and I wonder if there will be enough to go around.

Anyway I’m looking forward to trying an instance.

DavidBressler · September 18, 2018, 12:26am

I’m trying to find a solution to using vast.ai. I’m guessing each time I create a new vast.ai instance, I would have to port over all my data using scp. Can someone recommend an online storage service that I could easily use scp to transport all my data in to vast.ai each time I create a new instance?

lahwran · September 18, 2018, 6:22pm

Crap, I wasn’t getting notifications about this forum for a bit, my apologies!

The best way is if you can put it on a webserver somewhere and then download it with wget. I’d use amazon s3 for this personally, since it’s nice and barebones and I like that; you could also use a more consumer focused tool, I think one person used mediafire pro, I also found jumpshare while googling around. the important thing is that it needs to give you a direct url you can download with wget.

lahwran · September 18, 2018, 6:36pm

@clipmaker, replying to your issues individually:

We’ll have to fix the cards, I didn’t realize fonts misbehaved that much on different OSes. A simple table view option could be nice anyway, so I’ll add a view menu to set that. We’re using some features of the fancy web frameworks, but even with automatic reflow, the fancy web frameworks only do so much for complicated webapps - they only make it effortless to do responsive design on simpler things like blogs, it takes testing time we haven’t had to get mobile working to exactly the same degree as desktop.
Fluctuating balance is pretty concerning, that must mean that the preview balance is being redisplayed somehow. Your real balance would have been $1.10.
Noted, I’ll bump two factor auth up the todo a bit. Makes a lot of sense, I’d want it as well.
All the JSON apis are intended for others to use as well, but we haven’t documented it very well yet. You’re welcome to poke around in the chrome network inspector if you want, and I’ll make sure to put our partially tested CLI client up somewhere today.
Yeah, apologies for not clearly labeling it as limited.
It’s docker instances on fixed slices of what underneath docker are dedicated servers. It’s not dynamically allocated like AWS or google cloud, so if you rent all the gpus on a machine, you’ll have all the compute on it. Eg, the 8x instance that is very occasionally available will give you the full cpu power of that machine - which is quite a bit, I think. Some machines also don’t have many gpus relative to the amount of cpu they have. I’ll add displaying total number of gpus on a machine to the todo list for that table.
We have pre-funding on the todo list, you’re not the first to request it.
Yeah, we’re unrelated, we actually picked the name quite a while ago when we incorporated for another project.
It varies, I think you may have seen that when someone rented out like 20 instances all at once. Several of our hosts can bring many more machines over to vast if need arises.

The best ways to contact us quickly are chat on vast.ai, which we try to keep to a 10 minute or less response time all the time, or our discord, which we respond to within a few hours but also has other hosts and clients on it. Apologies for the long response time here, though, I wish I’d noticed this within a day!

DavidBressler · October 1, 2018, 4:03am

I’ve figured out a pretty seamless process for using vast.ai… It takes about 2 minutes each time to get up and running on a new GPU instance. Thought I’d share in case anyone else is interested.

It consists of:

Creating a new instance on the Vast.ai console (using the fast.ai OS Image and “Run interactive shell server, SSH”)
Installing install pip, awscli, and git.
Creating my file structure, and downloading my code from github
Activating the fast.ai python environment
Starting Jupyter notebook
Pulling the data I need for a particular project

Here’s the code I use for steps 2 thru 5:

#install pip, awscli, git
pip install --upgrade pip
echo “export PATH=~/.local/bin:$PATH” >> ~/.bashrc
source ~/.bashrc
pip install awscli --upgrade --user
apt install git
git config --global user.email "name@email.com"
git config --global user.name “First Last”
#create / update folder structure
cd /notebooks/fastai/
git pull
cd /notebooks
git clone https://github.com/YOURGIT.git
cd /data/
mkdir sav
mkdir localtoremote
#get ready to use jupyter
source activate fastai
aws configure
cd /notebooks
pip install jupyter; jupyter notebook --ip=127.0.0.1 --port=8080 --allow-root

Then in my Jupyter notebook for a particular project I’ll do something like:
!wget https://s3-us-west-2.amazonaws.com/youramazons3bucket/data/cifar10/cifar10.tgz -P /data/
!tar xvzf /data/cifar10.tgz -C /data/
!aws s3 cp --recursive s3://youramazons3bucket/data/sav/cifar10 /data/sav/cifar10

Hope this is helpful to someone.