Cloud GPU services vs having your own, which is better for AI?

DylanWu · September 22, 2022, 3:05pm

Howdy howdy y’all,

I’ve recently been thinking of building a desktop workstation, and have been attempting to find information on the cost and performance of buying your own GPU, vs renting one from a service like Lambda Labs.

The cheapest option that Lambda Labs provides is their RTX 6000 (I realize it’s an older card, but for the sake of argument), at $0.50 / hr, if you were to buy an RTX 6000, the average price I’ve seen is around $4000.

It seems like renting a the GPU from Lambda Labs is cheaper for the first 8000 hours of renting (nearly a year).

That means buying your own RTX 6000 is perhaps not the best choice unless you’re running it continuously for over a year.

However, (and this is where my confusion comes in) I’ve read some articles saying that cloud GPU performance is bottlenecked, and falls behind compared to the theoretical performance of the GPU.

In which case it’d be better to compare the Cloud RTX 6000 with a personal 3090 that’s 1/4 the cost.

Is it true that cloud GPUs are significantly bottlenecked enough to that purchasing a cheaper and theoretically less powerful GPU is better in the long run?

Thanks!
Dylan

Conwyn · September 23, 2022, 10:38am

Hi Dylan

Off topic from your question but there is also the cost of electricity, the heat it generates, the possibility of damage or theft. Also the potential loss if you resell. There are also contract rules when using GPU for fun and academic work.

I have noticed Colab Pro is much faster than Microsoft Azure for the same notebook when I tried it the other day (a sample of one is not really a sample). Also there is the TPU if you are using TensorFlow rather than PyTorch.

So it depends on how frequently and for how long because some cloud providers do have limits.

Regards Conwyn

vans163 · October 4, 2022, 2:20am

I think the RTX3090 is dropping in price, especially come mid october when the 4090s hit store shelves. I feel like a 4090 wont be a huge upgrade from the 3090, I would consider shopping for a set of 3090s come late Oct. Once the 4090ti hits though that is a different story, packed with 48G VRAM + all the extra tensor cores its a huge boost from the 3090.

I would say when ETH mining was a thing still, GPUs were too expensive, now they dropped in price 25-50% but the cloud prices didnt drop, so its more favorable rig up your own.

About the electricity/heat/powerdraw you can cap the 3090 at 270w to lose 5% and at 200w to lose around 40% performance.

Also if you have rigged up GPUs at home, you can always earn off them by connecting them to a service like https://gpux.ai to rent them out to other users.

sean · October 4, 2022, 4:01pm

Not sure about cloud GPU performance bottlenecks (I’ve read something about scaling up Geforce cards causes severe bottlenecks since GPUDirect Peering is disabled, requiring use of the CPU to gather the gradients, but didn’t think that was the case with scaling A6000s)(again not sure here) , but here are a few solid resources to help you with your search:

Lambda Labs Best GPUs
FSDL cloud gpu comparison and benchmarks
FSDL Lecture 2 GPU section
Hope this helps

matdmiller · October 5, 2022, 6:48am

My experience is that running on your own machine can be significantly faster than an equivalently spec’d card in the cloud, but it has been several years since I’ve run things in the cloud. I think the 3090 w/24GB of VRAM provides good bang for your buck, especially now that prices have dropped. It all depends on how much you’re going to use it, but I made the switch after several $200/mo AWS bills. Going from a K80 to a 1080ti was dramatically faster, I think somewhere around 3-10x. If you’re curious I can run a speed test on a dataset on my machine so you can compare it to running in the cloud. I did run a test on the IMDB language models and posted the results here:

The results will vary based on the model type. I didn’t note which GPU I got in COLAB when I did this test, but I’m guessing it was a T4 or P100. It was not a K80.

hollysylvia · August 30, 2024, 8:57am

Some cloud providers do suffer from multi-tenancy issues which can impact performance, so it’s important to go for one that doesn’t do any kind of oversubscription. Beyond that shop around for the best price, and make sure it’s a true on-demand offering where you can easily pause and resume your use of the GPU, rather than be forced into a 1 or 3 month commitment. I’ve been spinning up H100’s on-demand on Hyperstack, their VM hibernation feature is great.

Their on-demand pricing starts at $0.30 per hour for RTX A4000: Cloud GPU Pricing | HyperStack