I just started using default instructions in fastai. GCP.
The install and updates all went well. Notebooks start ok, data downloads, performance seems good. But I have not been able to complete a single run through of the resnet50 section of Lesson 1.
4 times today my instance has been shutdown. I’ve tried restarting it immediately, and with delays of an hour. I’d have to describe it as unusable. I’m running the vm in
ZONE=us-west1-b from Sydney.
Is there any advice on how to work with this environment effectively?
Just had a run through from RESNET50 to the end.
Might I add… I am really pleased with how smoothly the set-up and run worked.
you’ll actually face this problem quite often. What i’ve seen is that the zone US-west1b is usually a very busy server. You can try another server in the US, that supports the GPU that you’re using, or a server somewhere in Europe. That’ll reduce the preemption frequency by a lot.
Is there any view which shows stats on preemption for various locations?
I have the impression that it has gotten a lot worse recently. Last year I worked through the entire fastai course on a preemptible P100 instance on GCP. Sure, it shut down once in a while but overall it worked fine. This year however I found the preemptible instance to be nearly unusable. So now I run as much as possible on Paperspace (or alternatively Colab) and I use my GCP credits only for a non-preemptible P100 GPU when I really need it.
But alternatvely, i think there are stats available for the amount of users on each server. Im not entirely sure, will have to check it out. You can try searching for it, I will too, and tell you for sure.
true! Its almost unusable lately…I switched to colab because of how my GCP instance would preempt every 5 minutes.
But hopefully Google expands their server capacity, because its definitely superior (in my opinion only) to other platforms, whether it be speed, CUDA capacity, and even access to local disk…
Also agree it has become unusable (in the past it was very usable). The documentation claims preemption historically is only 5-15% per day (I’m experiencing more like 40% per hour!)
For reference, we’ve observed from historical data that the average preemption rate varies between 5% and 15% per day per project, on a seven-day average, occasionally spiking higher depending on time and zone. Keep in mind that this is an observation only: Preemptible instances have no guarantees or SLAs for preemption rates or preemption distributions.
It really became unusable So either use normal instances or change platform. I just switched to running normal vm (and yes you can setup another google account for the same credit card if you run out of free credits).
You can try to change the zone to resolve the issues.
By default every one configures to US-west1b mostly.
You can change this zone manually or automatically here.
Hope this helps
Addendum :- List of Zones to Choose from GCP
NAME REGION STATUS NEXT_MAINTENANCE TURNDOWN_DATE
us-east1-b us-east1 UP
us-east1-c us-east1 UP
us-east1-d us-east1 UP
us-east4-c us-east4 UP
us-east4-b us-east4 UP
us-east4-a us-east4 UP
us-central1-c us-central1 UP
us-central1-a us-central1 UP
us-central1-f us-central1 UP
us-central1-b us-central1 UP
us-west1-b us-west1 UP
us-west1-c us-west1 UP
us-west1-a us-west1 UP
europe-west4-a europe-west4 UP
europe-west4-b europe-west4 UP
europe-west4-c europe-west4 UP
europe-west1-b europe-west1 UP
europe-west1-d europe-west1 UP
europe-west1-c europe-west1 UP
europe-west3-c europe-west3 UP
europe-west3-a europe-west3 UP
europe-west3-b europe-west3 UP
europe-west2-c europe-west2 UP
europe-west2-b europe-west2 UP
europe-west2-a europe-west2 UP
asia-east1-b asia-east1 UP
asia-east1-a asia-east1 UP
asia-east1-c asia-east1 UP
asia-southeast1-b asia-southeast1 UP
asia-southeast1-a asia-southeast1 UP
asia-southeast1-c asia-southeast1 UP
asia-northeast1-b asia-northeast1 UP
asia-northeast1-c asia-northeast1 UP
asia-northeast1-a asia-northeast1 UP
asia-south1-c asia-south1 UP
asia-south1-b asia-south1 UP
asia-south1-a asia-south1 UP
australia-southeast1-b australia-southeast1 UP
australia-southeast1-c australia-southeast1 UP
australia-southeast1-a australia-southeast1 UP
southamerica-east1-b southamerica-east1 UP
southamerica-east1-c southamerica-east1 UP
southamerica-east1-a southamerica-east1 UP
asia-east2-a asia-east2 UP
asia-east2-b asia-east2 UP
asia-east2-c asia-east2 UP
asia-northeast2-a asia-northeast2 UP
asia-northeast2-b asia-northeast2 UP
asia-northeast2-c asia-northeast2 UP
asia-northeast3-a asia-northeast3 UP
asia-northeast3-b asia-northeast3 UP
asia-northeast3-c asia-northeast3 UP
asia-southeast2-a asia-southeast2 UP
asia-southeast2-b asia-southeast2 UP
asia-southeast2-c asia-southeast2 UP
europe-north1-a europe-north1 UP
europe-north1-b europe-north1 UP
europe-north1-c europe-north1 UP
europe-west6-a europe-west6 UP
europe-west6-b europe-west6 UP
europe-west6-c europe-west6 UP
northamerica-northeast1-a northamerica-northeast1 UP
northamerica-northeast1-b northamerica-northeast1 UP
northamerica-northeast1-c northamerica-northeast1 UP
us-west2-a us-west2 UP
us-west2-b us-west2 UP
us-west2-c us-west2 UP
us-west3-a us-west3 UP
us-west3-b us-west3 UP
us-west3-c us-west3 UP
us-west4-a us-west4 UP
us-west4-b us-west4 UP
us-west4-c us-west4 UP
I found this page which shows availability of GPUs by region. I searched for ‘P100’, changed to my local region, deleted my old vm, and created a new one, and have been having excellent performance since.
That is a sample of a few hours - so who really knows???