@Gaurav85@adpostma i had also faced a similar issue while creating the instance for GCP. i followed certain steps and got it created. ( I am using a google cloud sdk shell ).
For creating the instance I used this as my command: gcloud compute instances create fast-ai-sahil-try --image-family=pytorch-1-0-cu92-experimental --image-project=deeplearning-platform-release --maintenance-policy=TERMINATE --accelerator=type=nvidia-tesla-k80,count=1 --machine-type=n1-highmem-8 --boot-disk-size=200GB --metadata=install-nvidia-driver=True --preemptible
Please check if this works for you and if not please mention the error you are getting so that I can have a look and see if I faced it before. Lets solve this (As I know getting these things running on Win 7 can be a bit painful)
I had the same issue when I followed the setup procedure. Arunoda’s right about it being a quota issue. IIRC, you’ll need to click through the GCP web interface on to the IAM & admin section to request access to a GPU. This guide, which is linked to under the fastai_v3 topic `Unofficial Setup thread (Local, AWS) has an image of what you need to find under ‘Step 5’.
It tells you to wait for an approval email, but the approval happened instantly when I tried it.
You can find the GPUs by filtering on Service: Compute Engine API and then the NVIDIA ... options under Metrics. Do not filter on GPUs, because nothing seems to show up.
Also, I have not installed any software on my computer. Everything works using the Cloud Shell and web GUI.
The trick here is Gaurav you need to try the command 3-4 times. I also had this issue that it says there are no GPU instances available but try it 3 -4 times and you will be successful. Hope this helps.
Yeah my instance is on. I also tried: gcloud compute config-ssh
The verbose output when I try to ssh is:
OpenSSH_7.7p1, LibreSSL 2.7.3
debug1: Reading configuration data /Users/sarah/.ssh/config
debug1: /Users/sarah/.ssh/config line 46: Applying options for my-fastai-instance.us-west2-b.modeltraining-120938
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: /etc/ssh/ssh_config line 48: Applying options for *
debug2: resolve_canonicalize: hostname 35.236.121.17 is address
debug2: ssh_connect_direct: needpriv 0
debug1: Connecting to 35.236.121.17 [35.236.121.17] port 22.
ssh: connect to host 35.236.41.144 port 22: Operation timed out
ERROR: (gcloud.compute.ssh) [/usr/bin/ssh] exited with return code [255].
ERROR: (gcloud.compute.ssh) [/usr/bin/ssh] exited with return code [255].
UPDATE
I managed to fix this by going into ~ .ssh directory. I then removed both the google_compute_engine and google_compute_engine.pub files and was able to ssh into my instance (new files were created for the ones removed).
I had the same problem for a full day, Could not connect to my instance anymore (even after trying 10s of times with no luck).
In the documentation it says that free accounts do not have standard GPU quota.
I ran into the same issue just now, though I had no issue when I originally setup my account and instance. I also simply hit the button to “Upgrade” my account and then everything worked as before.
By “create a new conda environment” do you mean create a new instance?
Any guides on how i would do that? (there were many setup parameters for fastai instance, do i copy them?)
If you mean stay within fastai instance, I don’t think i should be using this export IMAGE_FAMILY="pytorch-1-0-cu92-experimental" because it has CUDA 9.2 compiler which does not fit with tensorflow-gpu. https://stackoverflow.com/questions/50605684/support-for-nvidia-cuda-toolkit-9-2
I am trying to create an conda environment in my GCP instance wherein I can run the previous fastai 0.7 version as well. I created the environment using conda env create -f environment.yml. The ‘environment.yml’ file is the one used for creation of the fastai 0.7 GPU environment. But once created I am not able to see it as a kernel in my Jupyter notebook. I tried installing nb_conda in both the ‘base’ as well as ‘fastai0.7’ environment. But it still does not work. Can anyone help on this?
I needed to use os.chdir('/foo/bar/baz') to change directories in Jupyter notebooks. If you use it, you will need to import os.
For ls, try %ls path or !ls path. And if you want the list of files from ls in columns, try %ls --width 125 - it spreads the file names horizontally, so it doesn’t take up as much vertical space.
Today when I try to boot my VM I’ve got an error: ’ Quota ‘GPUS_ALL_REGIONS’ exceeded. Limit: 0.0 globally.’
which is weird,
My vm setup is identical to that in manual, but changed zone to europe-west4-a, because I live in EU.
On quotas page in gcp it says the quota is 1 for my zone