Platform: GCP ✅

Hi, there

I’m desperately looking for help to setup/connect to GCP.
I have created the instance through cloud shell. However I’m getting error msg when I tried to connect to my instance using following line in cloud shell:

gcloud compute ssh --zone=$ZONE jupyter@$INSTANCE_NAME -- -L 8080:localhost:8080

The error msg I got is:
ssh: connect to host 34.83.191.14 port 22: Connection refused
ERROR: (gcloud.compute.ssh) [/usr/bin/ssh] exited with return code [255].

Seems GCP is having trouble establish SSH. Any suggestion how to solve this problem?

Greatly appreciated!

Updated:
just to add some additional information: the instance was create by using following lines in cloud shell

export IMAGE_FAMILY="pytorch-latest-gpu" # or "pytorch-latest-cpu" for non-GPU instances
export ZONE="us-west1-b"
export INSTANCE_NAME="my-fastai-instance"
export INSTANCE_TYPE="n1-highmem-8" # budget: "n1-highmem-4"

# budget: 'type=nvidia-tesla-k80,count=1'
gcloud compute instances create $INSTANCE_NAME \
        --zone=$ZONE \
        --image-family=$IMAGE_FAMILY \
        --image-project=deeplearning-platform-release \
        --maintenance-policy=TERMINATE \
        --accelerator="type=nvidia-tesla-p100,count=1" \
        --machine-type=$INSTANCE_TYPE \
        --boot-disk-size=200GB \
        --metadata="install-nvidia-driver=True" \
        --preemptible

Having exactly same issue since yesterday. I have browsed through multiple threads with no luck. If this is such a big problem, why is GCP even recommended?

I started my instance through browser as per initial recommendation but that made no difference. Instance shuts down on its own after few mins.

Hoping to get past this so I can move onto real interesting stuff!

My experience of GCP over the past few weeks has been similar - being kicked off a preemptible VM after a few minutes. If you persist and keep reconnecting eventually you get lucky and connect for a session that lasts a few hours. It is very frustrating!

I suggest you try moving to Google Colab - it seems to be where all the cool kids are going these days.

I am also experiencing similar issues with my instance on GCP, but I am not sure if this is an SSH issue or due to preemption… I created my instance like described in the fastai docs. In the last few days, I get kicked off the instance randomly: most of the time directly after starting and connecting to it via ssh, then often after a few minutes and only sometimes I am able to have a connection over 10-15 minutes.
The error message I get in the terminal is always
Connection to [IP address] closed by remote host. Connection to [IP address] closed. ERROR: (gcloud.compute.ssh) [/usr/bin/ssh] exited with return code [255].
Sometimes, there are some error messages right before being kicked out which say
channel 3: open failed: connect failed: Connection refused
channel 4: open failed: connect failed: Connection refused

I have like no experience in ssh things, so any help or clarification is much appreciated :slight_smile: !

@AlisonDavey how does ‘being kicked off a preemptible VM’ look like? In the fastai guide for setting up GCP it says you’ll get noticed 30 seconds before your instance is shut off, but I never get any notice despite having the jupyter notebook error message that connection was lost…

I just assumed that the closing down message Connection to [IP address] closed by remote host. Connection to [IP address] closed. ERROR: (gcloud.compute.ssh) [/usr/bin/ssh] exited with return code [255]. was due to preemption because often when trying to reconnect I would then get the message Instance failed to start due to preemption. I never got any warning that the instance was going to close.

After being closed down 3 or 4 times eventually I managed to start a session that stayed open long enough to do something.

One day I was getting the channel 3: open failed: connect failed: Connection refused channel 4: open failed: connect failed: Connection refused message but that turned out to be because the instance was too full to run the Jupyter Notebook. I hadn’t realised that deleted files only went to Trash and were still taking up space. :flushed:

You can check View logs from the vertical dots on the right hand side of the VM instances page on GCP and see messages such as Compute Engine preempted us-west1-b:my-fastai-instance system@google.com {"@type":"type.googleapis.com/google.cloud.audit.Aud

2 Likes

Ah I see! I did not know about the logs but your assumption seems to be correct: the logs clearly show that all the time I have been kicked off was due to preemption (despite I never got a 30sec notice like told in the GCP docs…). Thanks! :slight_smile:

Could you give me some guidance as well in how you cleaned up your Trash space?

Just use cd to get to jupyter@my-fastai-instance:~/.local/share/Trash/files$ and then rm to get rid of deleted files.

GCP used to work much better than it is doing now. Hopefully this is a temporary blip.

2 Likes

Try creating (and connecting) your instance using terminal on your computer, instead of cloud shell, see if it works.

Hey everyone, I built a little macOS statusbar app that allows you to start, stop and monitor your GCP instance. It shows you whether your instance is stopped (:new_moon_with_face:), starting (:last_quarter_moon_with_face:), running (:full_moon_with_face:) or stopping (:first_quarter_moon_with_face:). I’ve just finished building it and haven’t tested it extensively, so any feedback you might have is very much welcomed :slight_smile:

You can download the app (and find the code) here https://github.com/jaapmengers/ComputeInstanceStatus

1 Like

can you explain how do you set up github for fastai…
i am following this guide and successfully completed step 3. but am stuck at step 4. i am not sure what it means when the guide says you should make sure github is configured and pull from the repository. i have never used github before and want to know if there is a guide to help.

@jeremy : can you explain how do you set up github for fastai…
i am following this guide and successfully completed step 3. but am stuck at step 4. i am not sure what it means when the guide says you should make sure github is configured and pull from the repository. i have never used github before and want to know if there is a guide to help.

I’m trying to set up GCP and got the following message when I did the ‘create an instance’ step:

WARNING: Some requests generated warnings: - Disk size: ‘200 GB’ is larger than image size: ‘50 GB’. You might need to resize the root repartition manually if the operating system does not support automatic resizing. See https://cloud.google.com/compute/docs/disks/add-persistent-disk#resize_pd for details.

I tried to follow the instructions at the link, but when I went to try to resize the disk, it already shows up as 200GB, so I don’t know what to change.

Can anyone help with this? I see that @matmat96 and @AndreaPi have both previously posted about this same problem, but I can’t seem to see any responses to their posts.

Thanks!

That is because your instance quota limit is 0, check the following site to adjust your instance quota and then redo the “Step3: Create an Instance”.

https://course.fast.ai/start_gcp.html#step-3-create-an-instance

Does anyone have any suggestions on how I can get QTConsole to connect to my google cloud compute instance jupyter notebook?
I’m on windows 10, and I’ve tried using
%connect_info in the instance jupyter notebook - pasting the part inside {} into a new kern.json file and trying to connect to the kernel on the instance from my machine using both
“jupyter qtconsole --existing C:\WinPy\WPy64-3740\kern.json” - this results in a QtConsole popping up and endlessly cycling “kernel died/restarting”
I’ve also tried
“jupyter qtconsole --ssh=remote --existing C:\WinPy\WPy64-3740\kern.json” and that just results in…nothing. No error, no console, just nothing.

If no one is sure, I’m happy to hear any suggestions for what to use to step through code with a debugger. Using %%debug or %pdb inside a notebook cell directly is just incredibly messy.

Hi, is GCP working for you now? I’m seeing similar things but with a brand new instance under a brand new account, so there shouldn’t be a trash issue. It wasn’t like this even just a month ago…

You mean that the VM instance keeps shutting down? I am having that problem right now…

Yeah. Is this with an instance that you have already been using, say, for a few months now, one that was once good?

I had set up this instance months ago with a bit of use on lesson 1 nb, but then stopped due to other endeavors. I think it was good then, i don’t remember having this kind of issues. Now I have try to used it for 2 weeks now, but it is not letting me having a good work flow because keeps shutting down. Why did you ask that, does it make a difference whether you been using it for long time or not?

Other people have also reported the same, there are comments up above from Oct 8 and 28 days ago about this. I am thinking of going with Google Colab for small data and regular lessons and thinking of creating an normal VM instance for my project which contains 30 gb of data.

The instructions for creating an instance at https://course.fast.ai/start_gcp.html#step-3-create-an-instance have changed in the last couple of months.

The zone has changed from us-west2-b to us-west1-b:

and the GPU from P4 to P100:

For some reason, the instance I created a few days ago, after these changes, just keeps getting preemptied immediately after being started. The instance is not assigned an external IP like before either.

I’ll try a different region/zone, and maybe a different GPU, to see if anything changes.

1 Like

So, us-central seems to be a bit better for not getting pre-emptied, as some people have also mentioned here. Guess one can also create a normal instance that is not preemptible. I think these are about 5 times more expensive though.

1 Like