Platform: GCP ✅

Thank you. Can you link to an install guide, or something similar?
My best guess now is to delete my instance and just start over (using the fast.ai GCP tutorial).

I use this: https://github.com/arunoda/fastai-shell

Thanks, that works wonders. Question on this: where do you store you own work? On my other instance I stored it in the same dl1 folder, but that messes up the git pull command (I need to stash and stash pop to get git pull working).

I have not been able to start my GPU machine on GCP since the past couple of days.
Its says “quota exceeded globally”. I have started the machine with out GPU to do some data pre-processing but, did not do any model training.

Is any one in the same situation as me?
Should I move to AWS?

You need to increase your GPU quota and upgrade your account.
Search this thread for more info.

You can save anywhere inside the your instance.
But I recommend do it in a different directory and push changes to GitHub.

1 Like

Question: sgugger has committed a fix to the master branch, that is currently not in the latest (tagged) release. As of writing, 1.0.27 is the latest tag, which I have, using the update_fastai.sh script.

Do you know of a quick way for me to download the master branch, while still maintaining the possibility to use update_fastai.sh if needed?

Simply do this: https://github.com/fastai/fastai#developer-install

Do this after update-fastai.sh.
(Which updates pytorch)

Perfect! After doing this, version shows 1.0.28.dev.
What would be the procedure to reverse this again?

Just install the fastai via conda.
Or just run the update-fastai.sh script.

You can also checkout a release tag in the repo instead of checkout the master.

While trying to create a V100 instance I am receiving this error Quota 'GPUS_ALL_REGIONS' exceeded. Limit: 0.0 globally.

Check your GPU quota settings (IAM& admin->Quotas).
If you have to change quota for GPU, just write a ticket (choose GPU quota->Edit).

When you get error when SSH to a GCP instance:

[Connection Refused]

The solution is to excute this line in your pc bash:
gcloud compute routes create default-internet --destination-range 0.0.0.0/0 --next-hop-gateway default-internet-gateway

This is because the default route for non-local traffic (0.0.0.0/0) had been inadvertently deleted, which caused all external traffic to be lost on the return path.

Source

@tillia I cannot see any quota related to GPU in my IAM & Admin -> Quotas

Start filter in Metrics dropdown by ‘GPU’ - you should see all GPU related quotas.

@tillia In filters I see all the GPU and they are enabled (blue tick in front of them)

You have to filter only GPU quotas (in dropdown None, then filter by GPU and check quotas you wanna select). You should have all GPU quotas listed and on the right side of table should be actual quota for each row. Use checkbox in quotas you want to change and then select edit and write a ticket :slight_smile:

Just in case someone encounters this problem in gcp: Learning is very slow, because pytorch only uses one process, even though you specified num_workers = x (>1) (normally fastai does this for you by default with x = num cpus). This seems to be a bug in older pytorch versions (also the one that came preinstalled with the official image I used according to fastai docs.)

Upgrade pytorch with conda install pytorch-nightly -c pytorch, not conda update (which will tell you it is on the latest already), and the problem will be solved and you will get multiple workers and faster training. (works at least with version build pytorch.dev2018-11-30)

4 Likes

I tried updating the course but got this error. help pls?

jupyter@instance-1:~$ cd tutorials/fastai/course-v3

-bash: cd: tutorials/fastai/course-v3: No such file or directory
jupyter@instance-1:~$

1 Like

same here. looks like it’s no long available

1 Like