Platform: GCP ✅

I am getting this error on step 3, closely following of the GCP tutorial (https://course.fast.ai/start_gcp.html):

ERROR: (gcloud.compute.instances.create) Could not fetch resource:

  • The resource ‘projects/…/zones/us-west1-b/acceleratorTypes/nvidia-tesla-p4’ was not found

What happened?

@dries

You could try export ZONE=“us-west2-b” because that worked for me.

3 Likes

Thanks! That did the trick.

This did the trick for me too. This should be added to the server setup tutorial.

1 Like

Hi,

Instead of stopping my instance, I clicked on the delete option by mistake.

Is there any way of retrieving my instance back ?

If I cannot retrieve it and end up creating a new instance , will I be able to create the new instance with the cheap billing option that offers the $300 credit ?

Thanks.

Hello,

I deployed a server with Ubuntu on GCP and with fastai at: https://console.cloud.google.com/marketplace/details/click-to-deploy-images/deeplearning

For some long trainings, I would like to make the jupyter notebook run also in case my browser or local computer are turned off.
I read that something like that it is possible to do it with screen or tmux but so far I did not figurate exactly how to do that.

Can somebody explain me how to do that with any appropriate tool?

1 Like

Deleting and recreating instances on GCP is fine. So you’ll need to recreate it, which will turn the machine on for the first time.

You will be charged for the first minute no matter what:

  1. All vCPUs, GPUs, and GB of memory are charged a minimum of 1 minute . For example, if you run your virtual machine for 30 seconds, you will be billed for 1 minute of usage.
  2. After 1 minute, instances are charged in 1 second increments .
1 Like

Thank you !

Just a brief comment regarding a workaround I needed to get fastai working on WIndows.

Hello, new member of the course and forums here so please accept my apologies if I haven’t posted properly (couldn’t find a Post button so I am replying to the pinned post).

In short, Step 3 of the Tutorial on GCP setup suggests using an Ubuntu terminal, but I had a very annoying issue as my Windows (or Microsoft) Store would not install anything no matter how many things I tried (I was not about to reinstall Windows entirely).

After several gruelling days, my fix was to install Google Cloud SDK manually (after setting up a GCP billing account and project, of course) and within Cloud SDK to type “gcloud init” and then “gcloud compute ssh PROJECT-NAME – -L 8080:localhost:8080” and finally typing “http:localhost:8080” into my browser (where PROJECT-NAME is the name of your VM instance).

I really hope this might help anyone with this silly Windows Store issue (quite a common one apparently) trying to set up on GCP.

Very excited to deep learn!

Kind regards,
Milad

Hi everyone,

I’m just wondering about the actual cost of the “Budget Compute” setup in the setup guide. According to the guide, “this setup will roughly double your training time”. Doesn’t this theoretically mean that to accomplish the same amount of work as the standard setup could in 1 hour (which costs $0.38), the budget setup would need 2 hour (which costs 2 x $0.23 = $0.46, or an increase of 8 cent!)?

Has anyone encountered this theoretical problem with the budget setup? How is your actual experience with it, ie. how slower on average is it?

Thanks a lot!

Hi, there

I’m desperately looking for help to setup/connect to GCP.
I have created the instance through cloud shell. However I’m getting error msg when I tried to connect to my instance using following line in cloud shell:

gcloud compute ssh --zone=$ZONE jupyter@$INSTANCE_NAME -- -L 8080:localhost:8080

The error msg I got is:
ssh: connect to host 34.83.191.14 port 22: Connection refused
ERROR: (gcloud.compute.ssh) [/usr/bin/ssh] exited with return code [255].

Seems GCP is having trouble establish SSH. Any suggestion how to solve this problem?

Greatly appreciated!

Updated:
just to add some additional information: the instance was create by using following lines in cloud shell

export IMAGE_FAMILY="pytorch-latest-gpu" # or "pytorch-latest-cpu" for non-GPU instances
export ZONE="us-west1-b"
export INSTANCE_NAME="my-fastai-instance"
export INSTANCE_TYPE="n1-highmem-8" # budget: "n1-highmem-4"

# budget: 'type=nvidia-tesla-k80,count=1'
gcloud compute instances create $INSTANCE_NAME \
        --zone=$ZONE \
        --image-family=$IMAGE_FAMILY \
        --image-project=deeplearning-platform-release \
        --maintenance-policy=TERMINATE \
        --accelerator="type=nvidia-tesla-p100,count=1" \
        --machine-type=$INSTANCE_TYPE \
        --boot-disk-size=200GB \
        --metadata="install-nvidia-driver=True" \
        --preemptible

Having exactly same issue since yesterday. I have browsed through multiple threads with no luck. If this is such a big problem, why is GCP even recommended?

I started my instance through browser as per initial recommendation but that made no difference. Instance shuts down on its own after few mins.

Hoping to get past this so I can move onto real interesting stuff!

My experience of GCP over the past few weeks has been similar - being kicked off a preemptible VM after a few minutes. If you persist and keep reconnecting eventually you get lucky and connect for a session that lasts a few hours. It is very frustrating!

I suggest you try moving to Google Colab - it seems to be where all the cool kids are going these days.

I am also experiencing similar issues with my instance on GCP, but I am not sure if this is an SSH issue or due to preemption… I created my instance like described in the fastai docs. In the last few days, I get kicked off the instance randomly: most of the time directly after starting and connecting to it via ssh, then often after a few minutes and only sometimes I am able to have a connection over 10-15 minutes.
The error message I get in the terminal is always
Connection to [IP address] closed by remote host. Connection to [IP address] closed. ERROR: (gcloud.compute.ssh) [/usr/bin/ssh] exited with return code [255].
Sometimes, there are some error messages right before being kicked out which say
channel 3: open failed: connect failed: Connection refused
channel 4: open failed: connect failed: Connection refused

I have like no experience in ssh things, so any help or clarification is much appreciated :slight_smile: !

@AlisonDavey how does ‘being kicked off a preemptible VM’ look like? In the fastai guide for setting up GCP it says you’ll get noticed 30 seconds before your instance is shut off, but I never get any notice despite having the jupyter notebook error message that connection was lost…

I just assumed that the closing down message Connection to [IP address] closed by remote host. Connection to [IP address] closed. ERROR: (gcloud.compute.ssh) [/usr/bin/ssh] exited with return code [255]. was due to preemption because often when trying to reconnect I would then get the message Instance failed to start due to preemption. I never got any warning that the instance was going to close.

After being closed down 3 or 4 times eventually I managed to start a session that stayed open long enough to do something.

One day I was getting the channel 3: open failed: connect failed: Connection refused channel 4: open failed: connect failed: Connection refused message but that turned out to be because the instance was too full to run the Jupyter Notebook. I hadn’t realised that deleted files only went to Trash and were still taking up space. :flushed:

You can check View logs from the vertical dots on the right hand side of the VM instances page on GCP and see messages such as Compute Engine preempted us-west1-b:my-fastai-instance system@google.com {"@type":"type.googleapis.com/google.cloud.audit.Aud

2 Likes

Ah I see! I did not know about the logs but your assumption seems to be correct: the logs clearly show that all the time I have been kicked off was due to preemption (despite I never got a 30sec notice like told in the GCP docs…). Thanks! :slight_smile:

Could you give me some guidance as well in how you cleaned up your Trash space?

Just use cd to get to jupyter@my-fastai-instance:~/.local/share/Trash/files$ and then rm to get rid of deleted files.

GCP used to work much better than it is doing now. Hopefully this is a temporary blip.

2 Likes

Try creating (and connecting) your instance using terminal on your computer, instead of cloud shell, see if it works.

Hey everyone, I built a little macOS statusbar app that allows you to start, stop and monitor your GCP instance. It shows you whether your instance is stopped (:new_moon_with_face:), starting (:last_quarter_moon_with_face:), running (:full_moon_with_face:) or stopping (:first_quarter_moon_with_face:). I’ve just finished building it and haven’t tested it extensively, so any feedback you might have is very much welcomed :slight_smile:

You can download the app (and find the code) here https://github.com/jaapmengers/ComputeInstanceStatus

1 Like

can you explain how do you set up github for fastai…
i am following this guide and successfully completed step 3. but am stuck at step 4. i am not sure what it means when the guide says you should make sure github is configured and pull from the repository. i have never used github before and want to know if there is a guide to help.