Platform: Google Cloud Platform (GCP)

Has anyone found a workaround (or is there any ?)

I have been trying to create a VM with P100 GPU in Asia (asia-east1-a/c) zone. I’m constantly getting this error:
The zone ‘projects/steel-melody-212913/zones/us-west1-b’ does not have enough resources available to fulfill the request. Try a different zone, or try again later.

I’ve also tried in europe (europe-west4-a/c, )and us (us-west1-b, us-central1-c) zones, still the problem persists. Could you please let me know what can be done ? How is it even possible that P100 with a high-mem CPU is not available in all these zones ? I’ve been trying this for one day (doesn’t seem temporary).

I’ve also been having the same issues but following the advice given earlier, I created a preemptible instance and it worked. So add the --preemptible flag to the command used to create the instances.

It works sometimes, but the connections closes after 2 mins. It’s unreliable. I want to run experiments for longer periods.

Why aren’t we able to create non-preemptive instances ?

here’s my theory about the issue.

If you are on the free tier you have a very low priority on non-preemptible instances so most of the time you get the “not enough resources” error. The chances to get a preemtible instance is higher, because google can just turn it off if someone else needs it.

With a real paying GCP account (no free credits left) I haven’t had any problems to get a non-preemptible instance (even when I got the error with my other account at the same time).

So I guess they have enough resources if you are willing to pay the ~2$/h :slight_smile: otherwise your priority for the instances is just too low.

Over the weekend I was lucky to get a preemtible P100 instance that wasn’t preempted once in europe-west1-d. Usually it will be preempted after 30-60 minutes.

It’s probable that they have fewer resources reserved for free-tier users. I’ve been using GCP for 2 years now. In the initial year, I had no problem with preemptible VMs, I was able to use them for multiple hours, did not encounter any issue. But last year, preemptible VMs used to turn of after few minutes. So non-preemptible VMs worked fine. Now, I’m not able to create a non-preemptible VMs. So what changed now ? I guess one has to pay now to use VM as you suggested. Although you get credits, they are not usable if VM doesn’t turn on.

1 Like

I have tried to set up GCP machine as advised and run into the following problems:

  1. Quotas: For “n1-highmem-16” instance you need 16 CPUs but the default quota is 8 (solution: increase CPU for “us-west1-b” from 8 to 16 and overall CPU quota from 12 to 16)
  2. CUDA driver: Installation of fastai2 upgrades torch from 1.4.0 to 1.6.0 and you get The NVIDIA driver on your system is too old AssertionError (solution: update CUDA Toolkit to 11.0)

After that, everything seems to work fine.

1 Like

Hi, I notice that GCP instructions are no longer included in the new course — any idea why?

1 Like

The ‘course.fast.ai’ site used to come from the ‘https://github.com/fastai/course-v3’ repository, but with the new course, the repository got switched to ‘https://github.com/fastai/course20’ which does not (yet?) include deployment documentation.

I’m using GCP and got everything running.
When I execute the first cell (imports) in the notebooks (e.g. in notebook 01_intro) I get the following ImportError:

ImportError: cannot import name ‘mobilenet_v2’ from ‘torchvision.models’ (/opt/conda/lib/python3.7/site-packages/torchvision/models/init.py)

I pip installed the fastai2 library and updated it again. The error is still the same.
Any idea?

I believe the instructions at the top of the thread are a bit out of date. I ran into the same issue and resolved it by uninstalling all of the existing fastai packages and then installing fastai (not fastai2 – the latest version of the fastai package is fastai v2). Then the code ran (though there are still some outstanding problems in my setup I’m trying to resolve).

This how-to post was also posted in the Fastai2 and new course thread, but it seemed more appropriate here in the GCP Platform thread.

Please note that fastai 2 requires torch-1.6.0 and torchvision-0.7.0. The cuda drivers on the platform image are 10.1 and too old for torch-1.6.0. There are no pytorch images in the deeplearning-platform-release family with 10.2 or 11 cuda drivers. However the 10.2 drivers can be updated per @micstan

Here are the steps I followed to setup the GCP image with the new release and the book:

Follow the old GCP setup guide here: http://course19.fast.ai/start_gcp.html

Open a terminal. I use Putty

Login to terminal: gcloud compute ssh --zone “us-central1-b” “jupyter@fastai-4” – -L 8080:localhost:8080

Install 10.2 cuda

wget http://developer.download.nvidia.com/compute/cuda/10.2/Prod/local_installers/cuda_10.2.89_440.33.01_linux.run
sudo sh cuda_10.2.89_440.33.01_linux.run

Install fastai 2, fastcore and fastbook

cd tutorials
mv fastai fastai.old
git clone --recurse-submodules https://github.com/fastai/fastai
pip install -e “fastai[dev]”
git clone --recurse-submodules https://github.com/fastai/fastcore
cd fastcore
pip install -e “.[dev]”
cd …
git clone https://github.com/fastai/fastbook.git
cd fastbook
pip install -r requirements.txt
cd …

Check to see if pytorch and cuda are happy

python -c ‘import torch; print(torch.__version__); print(torch.version.cuda); print(torch.cuda.is_available()); print(torch.cuda.current_device())’

Test a few notebooks in the course and the fastbook folders

Launch local browser: http://localhost:8080/tree/tutorials

Verify the notebooks run

Run notebook from: http://localhost:8080/tree/tutorials/fastai/dev_nbs/course
Run notebook from: http://localhost:8080/notebooks/tutorials/fastbook/

Run a notebook with training to make sure the gpu is being used by looking at the training times for the epochs and checking out the sm and mem columns output from nvdia-smi dmon

nvidia-smi dmon

Perhaps others will have a more elegant solution, but for something quick to get started I haven’t run into any issues running fastai v2 and notebooks on GCP this way.

Cheers and many thanks for all the stellar work on the course, book and API! Mark

6 Likes

@markphillips , i just initiated a doc for the setup (https://github.com/fastai/course20/pull/9). Google Cloud has now a simplified method using “AI platform” that i believe can be easier for many people. Please feel free to modify. This one works for me but i’m not an expert in GCP so if others could edit it would be great.

1 Like

Nice! I’ll bet you’re right about this being easier for many people :grinning: An easier option with less knobs and an option with pretty much everything should pretty much cover all the bases.

Thanks for this, Mark! Looks like it’s working for me now. Minor nit, it’s print(torch.version.cuda). Thanks again!

Cool! I forgot to escape my underscores :grinning: print(torch.__version__);

1 Like

Is your post above necessary to use a Tesla T4 on GCP? When I try to run
language_model_learner(qdl, AWD_LSTM, metrics=[accuracy, Perplexity()], wd=0.1).to_fp16()

I get the error AssertionError: Mixed-precision training requires a GPU, remove the call to_fp16``.

I am following the examples on the text tutorial.

print(torch.__version__); print(torch.version.cuda); print(torch.cuda.is_available())
1.6.0 10.2 False

UPDATE: I followed the directions to install 10.2 cuda and it is working now. I did not have to follow all of the extra directions for github.

Glad it’s working with the new drivers! As you note, the github install is not required unless you want an editable version of fastai 2 and fastcore. If you don’t want that you could simplify the process by using conda:

conda install -c fastai -c pytorch fastai

If you want the fastbook the github clone will get the notebooks for that…

Guys, GCP is denying my request for GPU quota increase. Any idea why this may be happening? Why does this even require an approval process?

1 Like

@deep-learner Not all models are available in all regions. It can be one of the reasons you got rejected. There are some advices here: https://groups.google.com/g/gce-discussion/c/UWpvMNqkVjc?pli=1