I have been trying to create a VM with P100 GPU in Asia (asia-east1-a/c) zone. I’m constantly getting this error:
The zone ‘projects/steel-melody-212913/zones/us-west1-b’ does not have enough resources available to fulfill the request. Try a different zone, or try again later.
I’ve also tried in europe (europe-west4-a/c, )and us (us-west1-b, us-central1-c) zones, still the problem persists. Could you please let me know what can be done ? How is it even possible that P100 with a high-mem CPU is not available in all these zones ? I’ve been trying this for one day (doesn’t seem temporary).
I’ve also been having the same issues but following the advice given earlier, I created a preemptible instance and it worked. So add the --preemptible flag to the command used to create the instances.
If you are on the free tier you have a very low priority on non-preemptible instances so most of the time you get the “not enough resources” error. The chances to get a preemtible instance is higher, because google can just turn it off if someone else needs it.
With a real paying GCP account (no free credits left) I haven’t had any problems to get a non-preemptible instance (even when I got the error with my other account at the same time).
So I guess they have enough resources if you are willing to pay the ~2$/h otherwise your priority for the instances is just too low.
Over the weekend I was lucky to get a preemtible P100 instance that wasn’t preempted once in europe-west1-d. Usually it will be preempted after 30-60 minutes.
It’s probable that they have fewer resources reserved for free-tier users. I’ve been using GCP for 2 years now. In the initial year, I had no problem with preemptible VMs, I was able to use them for multiple hours, did not encounter any issue. But last year, preemptible VMs used to turn of after few minutes. So non-preemptible VMs worked fine. Now, I’m not able to create a non-preemptible VMs. So what changed now ? I guess one has to pay now to use VM as you suggested. Although you get credits, they are not usable if VM doesn’t turn on.
I have tried to set up GCP machine as advised and run into the following problems:
Quotas: For “n1-highmem-16” instance you need 16 CPUs but the default quota is 8 (solution: increase CPU for “us-west1-b” from 8 to 16 and overall CPU quota from 12 to 16)
CUDA driver: Installation of fastai2 upgrades torch from 1.4.0 to 1.6.0 and you get The NVIDIA driver on your system is too old AssertionError (solution: update CUDA Toolkit to 11.0)
I’m using GCP and got everything running.
When I execute the first cell (imports) in the notebooks (e.g. in notebook 01_intro) I get the following ImportError:
ImportError: cannot import name ‘mobilenet_v2’ from ‘torchvision.models’ (/opt/conda/lib/python3.7/site-packages/torchvision/models/init.py)
I pip installed the fastai2 library and updated it again. The error is still the same.
Any idea?
I believe the instructions at the top of the thread are a bit out of date. I ran into the same issue and resolved it by uninstalling all of the existing fastai packages and then installing fastai (not fastai2 – the latest version of the fastai package is fastai v2). Then the code ran (though there are still some outstanding problems in my setup I’m trying to resolve).
This how-to post was also posted in the Fastai2 and new course thread, but it seemed more appropriate here in the GCP Platform thread.
Please note that fastai 2 requires torch-1.6.0 and torchvision-0.7.0. The cuda drivers on the platform image are 10.1 and too old for torch-1.6.0. There are no pytorch images in the deeplearning-platform-release family with 10.2 or 11 cuda drivers. However the 10.2 drivers can be updated per @micstan
Here are the steps I followed to setup the GCP image with the new release and the book:
Run a notebook with training to make sure the gpu is being used by looking at the training times for the epochs and checking out the sm and mem columns output from nvdia-smi dmon
nvidia-smi dmon
Perhaps others will have a more elegant solution, but for something quick to get started I haven’t run into any issues running fastai v2 and notebooks on GCP this way.
Cheers and many thanks for all the stellar work on the course, book and API! Mark
@markphillips , i just initiated a doc for the setup (https://github.com/fastai/course20/pull/9). Google Cloud has now a simplified method using “AI platform” that i believe can be easier for many people. Please feel free to modify. This one works for me but i’m not an expert in GCP so if others could edit it would be great.
Nice! I’ll bet you’re right about this being easier for many people An easier option with less knobs and an option with pretty much everything should pretty much cover all the bases.
Is your post above necessary to use a Tesla T4 on GCP? When I try to run language_model_learner(qdl, AWD_LSTM, metrics=[accuracy, Perplexity()], wd=0.1).to_fp16()
I get the error AssertionError: Mixed-precision training requires a GPU, remove the call to_fp16``.
Glad it’s working with the new drivers! As you note, the github install is not required unless you want an editable version of fastai 2 and fastcore. If you don’t want that you could simplify the process by using conda:
conda install -c fastai -c pytorch fastai
If you want the fastbook the github clone will get the notebooks for that…