Paperspace start up extremely slow

The Problem
I am using Paperspace’s free GPU to host my notebook. The first couple of times I used it, it was very, very slow – taking 15 - 30 minutes to provision. But recently – it takes multiple hours. Today I have been trying to provision the machine for >8 hours. It’s practically unusable.

I’ve been searching on the forums and it seems thousands of people use the free paperspace GPU, so I am trying to understand what I am doing wrong.

More Context
I don’t mind paying to upgrade to a better GPU, but want to understand if I am doing something wrong first. For one of the assignments, I used Bing’s Search API to download <200 images of cocktails. I suspect this may be affecting the provisioning?

I’m also trying to avoid Collab since I don’t want to re-install dependencies & training data for every session.

I don’t think you’re doing anything wrong, it’s just not a very reliable service. I’m on the paid plan (8$/month) and even then the provisioning (for paid instances!) is often horribly slow.

If anyone has a solution to that, please share, but I suspect it’s a fault on Paperspace’s side.

For Colab, you can avoid uploading the training data every session by using Google Drive.

2 Likes

Unfortunately we are subject to the amount of traffic they are experiencing. I notice when I am on paperspace at really odd hours (early morning/late night) it works much better than during what I assume to be their peak hours of 5pm-9pm (people working on side projects after work)

They are still a fairly young startup so I imagine as they add more servers and grow, it will be a much better experience overall. Still the best cloud GPU option out there IMO.

1 Like

It may be volume, as others note, but awhile back the staff was concerned if my startup took more than 5 minutes, and may be a sign I had stored large files in the wrong place.

If I remember correctly, storing models etc. on your shared drive is much faster than storing in the instance. I think the idea is that the shared drive is just a map to a running service, where storing in the instance means moving all the stored data across the wire during spin-up. And models/data can get large indeed.

I haven’t used Paperspace recently, so check with them / other users.

Thank you for the hint! But I think I’m doing that already. To be clear, with the shared drive you mean the persistent storage in the storage folder? That’s where I keep all code and models, to make sure nothing gets lost (which happened to me in other folders).

Yes, storage - I’d forgotten as I haven’t logged run it in awhile.

TBH for most of the exercises in the fast.ai class, I think google colab is just way less hassle than gradient. It’s only once you’ve been training on a GPU for long periods of time that colab really has any downsides. The instant startup is a huge advantage in terms of removing reasons to practice deep learning.

2 Likes

Thanks - I’ll have to try it.

Thanks guys. I think it’s because I usually carve out time on Sunday afternoons to work on this. Which is probably when they have the most traffic to their free tiers. Let me check out Colab

There have been changes at Paperspace recently that (IMHO) severely negatively impact experience for playing around with deep learning. You’ll notice in the interface that your notebooks might have a “V1” written beside the notebook name (i.e. below your username) - if so, you’re running on their legacy “V1” notebooks service. But as I’ll explain later, that’s where you want to be.

V1 is notoriously slow for the speed of provisioning you mention - my experiments show this is strongly related to the size of your /notebooks folder - even just the git repository of fastbook at 400MB makes provisioning take several minutes, and if you use any decent fraction of your 8GB space allocation you’ll be taking an hour or more to shut down and spin up. However, /storage only has a 5GB allocation, so you’ll often find yourself writing models out during training etc to /notebooks because /storage will fill up with your data. Failing to clean up before shutdown is what causes your slow spin-up/shutdown.

Moving to V2 seems to dramatically improve spin-up/shutdown times - just create a new notebook from their templates to do this. However, the first thing you’ll notice is that you can’t run Free-P5000 machines on the free plan on V2 (contrary to their marketing materials). And Free-GPU instances are much slower than other free options out there, so you’ll be skipping those.

This means the only way you can keep using Free-P5000 instances is to stay on V1, so you’ll be forking notebooks - however they’ve now made it so forks can’t select Free-P5000 on free tier either (and I haven’t tried it, but my guess from the interface is that if you switch a V1 notebook away from Free-P5000, you can’t switch back). The upside of this is that very few people are now using the Free-P5000 instances, to what used to be a lucky gamble to get your hands on one is now practically guaranteed every time you spin up your cherished V1 Free-P5000 instance.

Which means V1 notebooks that happened to have Free-P5000 instances active at whenever the change happened just became gold-dust - just keep your /notebooks folder free

(BTW - it’s not very expensive gold-dust - a Free-P5000 instance has a GPU that is only 2/3 the speed of an AWS g4dn.xlarge instance, which costs about $0.16/hr on spot. Contrast with Paperspace charging $0.78/hr for P5000. Also keep in mind you can only run a single Free-P5000 instance at once, and it times-out after 6 hours so that means the full compute of a single run just saved you a whole $0.66 compared to paying AWS - if you’re learning one of the most highly sought-after skills on the planet, and yet your personal time is not worth more than that, you may wish to consider how you’re valuing your time…).