Platform: GCP ✅

Well… turns out I fixed the problem. Originally, I had been using the “Google Cloud Shell” that you can activate using the button in the top right corner when you start your instances (and it comes up at the bottom of the screen).

Turns out if I do it properly, and use the “Google Cloud SDK Shell”, i.e. the one you download onto your computer and install, that works like a charm following the tutorials. I guess lesson learnt is to follow the tutorials properly :stuck_out_tongue:

Thanks everyone for your help - really appreciate it :slight_smile:

arunoda has GCP tool / script that seems to work well. I solved my connection issue by using one of two alternate web addresses: www.crestle.ai or colab.google.com. :wink: I don’t want to work harder on my setup than I do on the actual course material.

1 Like

Glad you figured it out b/c I’m fresh out of ideas.

us-west2-c worked for me now. Thanks

No resources in us-west2-b? tesla-p4 gpu? I have been trying for hours! It worked till yesterday!

I have just started using this by @arunoda. With this you can change the instance type and zone on the fly so with this you should never be locked out from accessing your code as you can always spawn a non-GPU instance.

Also, did you try without preemptible option? This should be same as using dedicated instance as opposed to spot in AWS.

Thanks

Hi Paul,

Today I got the same problem. No resources for my instance in us-west2-b zone. So this is what I did to make my instance setting dynamic as per this blog post.

You can follow these steps if you already have an instance created as per the official procedure (instance with a specific GPU attached to it).

  1. Go to VM Instances. Edit and deselect the option ‘Delete boot disk when instance is deleted’.
  2. You can now delete your VM instance. This will not delete the disk.
  3. Follow the blog and create the (a) network, (b) firewall-rules
  4. You can then use the gcloud beta compute command to create an instance with the required choice of GPU (or without it) and attach to the disk. That way you can ALWAYS get access to the code.
  5. Though it is possible to move instances between zones, I have created a new project in different location and will use whichever has GPU resource available.

Let me know if this helps your case.

EDIT: I ran into environment related issues. When i tried again the fastai-shell commands worked (earlier they didn’t). This is better option as it allows you to move the instance to any zone where GPU is available.

2 Likes

thanks for helping. The issue ended up being that I somehow (accidentally) blocked port 22 so I was effectively unable to ssh in. Created a new instance all is well.

I’ve edited my VM instance to remove the GPU and then I can start it without any resource issue.
This could be an option if you can work on CPU or just want to get your data off GCP (like me).

2 Likes

How do we run tmux because the jupyter server is already running when I start the instance?

Finally I was able to launch an instance using your utility. Thanks a lot for your help and also for making such a nice utility tool.

1 Like

@czechcheck Received my GCP credits. Thanks a lot to you for helping out fellow learners.

Reptyr is the tool. :slight_smile: . Although your notebooks won’t stop even if you ssh out of the session. The jupyter runs in background. I had mentioned that they stop in previous replies. Apologies for that

I lost my internet connection at one point and when it was back up I could not reconnect to jupyter without ssh’ing in again and starting all over by executing my notebook cells to where I was up to. When running tmux it protects you from that and any running processes on the server would still be running - so my jupyter notebooks would still work in the browser.
Reading up on Reptyr now to see how it works and can help

1 Like

I’m a little confused here.
In my experience when you drop the session the jupyter server should still be running and your notebook along with it.
The only issue I have been having is that you lose any output of currently running cells.
Am I wrong in thinking those cells are still running? I’ve never waited them out, i’m too impatient.

When I spin up my instance again I will explicitly test this out… When it happened to me yesterday I simply could not get the notebook to connect again without running the gcloud ssh command again. Is it because when I do ssh I’m mapping the ports to locahost… thats probbaly why. :wink:

It happens. You did nothing wrong. It just happens some time. GCP probably has a more limited number of GPUs (need verification) compared to AWS. So when too many people start their instances, they run out of GPUs.

I would recommend having multiple instances set up in different zones. A simple way to go would be creating a snapshot of the disk of your current instance, then create same kinds of instances using that in different zones. When the resource is in demand, simply try all of them.

GCP is great in terms of UX and pricing, but in some sense it under-prices its instances and, as a result, they are too popular and sold out. That is the price we have to pay for it.

That’s a good one.
But I think what has happened last weekend is some issue they had internally.
We couldn’t start normal instances as well.

Did you find an official explanation about the service outage? Love to know.