Platform: GCP ✅

Thanks

Hi Paul,

Today I got the same problem. No resources for my instance in us-west2-b zone. So this is what I did to make my instance setting dynamic as per this blog post.

You can follow these steps if you already have an instance created as per the official procedure (instance with a specific GPU attached to it).

  1. Go to VM Instances. Edit and deselect the option ‘Delete boot disk when instance is deleted’.
  2. You can now delete your VM instance. This will not delete the disk.
  3. Follow the blog and create the (a) network, (b) firewall-rules
  4. You can then use the gcloud beta compute command to create an instance with the required choice of GPU (or without it) and attach to the disk. That way you can ALWAYS get access to the code.
  5. Though it is possible to move instances between zones, I have created a new project in different location and will use whichever has GPU resource available.

Let me know if this helps your case.

EDIT: I ran into environment related issues. When i tried again the fastai-shell commands worked (earlier they didn’t). This is better option as it allows you to move the instance to any zone where GPU is available.

2 Likes

thanks for helping. The issue ended up being that I somehow (accidentally) blocked port 22 so I was effectively unable to ssh in. Created a new instance all is well.

I’ve edited my VM instance to remove the GPU and then I can start it without any resource issue.
This could be an option if you can work on CPU or just want to get your data off GCP (like me).

2 Likes

How do we run tmux because the jupyter server is already running when I start the instance?

Finally I was able to launch an instance using your utility. Thanks a lot for your help and also for making such a nice utility tool.

1 Like

@czechcheck Received my GCP credits. Thanks a lot to you for helping out fellow learners.

Reptyr is the tool. :slight_smile: . Although your notebooks won’t stop even if you ssh out of the session. The jupyter runs in background. I had mentioned that they stop in previous replies. Apologies for that

I lost my internet connection at one point and when it was back up I could not reconnect to jupyter without ssh’ing in again and starting all over by executing my notebook cells to where I was up to. When running tmux it protects you from that and any running processes on the server would still be running - so my jupyter notebooks would still work in the browser.
Reading up on Reptyr now to see how it works and can help

1 Like

I’m a little confused here.
In my experience when you drop the session the jupyter server should still be running and your notebook along with it.
The only issue I have been having is that you lose any output of currently running cells.
Am I wrong in thinking those cells are still running? I’ve never waited them out, i’m too impatient.

When I spin up my instance again I will explicitly test this out… When it happened to me yesterday I simply could not get the notebook to connect again without running the gcloud ssh command again. Is it because when I do ssh I’m mapping the ports to locahost… thats probbaly why. :wink:

It happens. You did nothing wrong. It just happens some time. GCP probably has a more limited number of GPUs (need verification) compared to AWS. So when too many people start their instances, they run out of GPUs.

I would recommend having multiple instances set up in different zones. A simple way to go would be creating a snapshot of the disk of your current instance, then create same kinds of instances using that in different zones. When the resource is in demand, simply try all of them.

GCP is great in terms of UX and pricing, but in some sense it under-prices its instances and, as a result, they are too popular and sold out. That is the price we have to pay for it.

That’s a good one.
But I think what has happened last weekend is some issue they had internally.
We couldn’t start normal instances as well.

Did you find an official explanation about the service outage? Love to know.

Thanks for your great help, I got credit of 500$ from your referral :slight_smile:

I’m not sure about an official update.
But there was a thread on HN about this.
See: https://news.ycombinator.com/item?id=18428497

2 Likes

I’m also having this issue. When the API is not configured correctly, there is a different error: “401 - Not Authorized”, however on GCP compute engine instances, the kaggle commandline does not seem to work. I got it working on my personal box, but not in the cloud.

I had the same issue.
Then I created a new instance (with a new IP), things works again.

I think kaggle is doing some IP level blocking.

Does anyone suddenly have issues with not being to access any notebook? It worked 10 hours ago on my GCP instance, but right now I’m getting 403 Forbidden errors on all my notebooks. Have tried different browsers, but nothing works.

Any hints?

I had that issue too with the official image. So then I switch backs to install fastai from scratch into an ubuntu.