Well… turns out I fixed the problem. Originally, I had been using the “Google Cloud Shell” that you can activate using the button in the top right corner when you start your instances (and it comes up at the bottom of the screen).
Turns out if I do it properly, and use the “Google Cloud SDK Shell”, i.e. the one you download onto your computer and install, that works like a charm following the tutorials. I guess lesson learnt is to follow the tutorials properly
Thanks everyone for your help - really appreciate it
arunoda has GCP tool / script that seems to work well. I solved my connection issue by using one of two alternate web addresses: www.crestle.ai or colab.google.com. I don’t want to work harder on my setup than I do on the actual course material.
I have just started using this by @arunoda. With this you can change the instance type and zone on the fly so with this you should never be locked out from accessing your code as you can always spawn a non-GPU instance.
Also, did you try without preemptible option? This should be same as using dedicated instance as opposed to spot in AWS.
Today I got the same problem. No resources for my instance in us-west2-b zone. So this is what I did to make my instance setting dynamic as per this blog post.
You can follow these steps if you already have an instance created as per the official procedure (instance with a specific GPU attached to it).
Go to VM Instances. Edit and deselect the option ‘Delete boot disk when instance is deleted’.
You can now delete your VM instance. This will not delete the disk.
Follow the blog and create the (a) network, (b) firewall-rules
You can then use the gcloud beta compute command to create an instance with the required choice of GPU (or without it) and attach to the disk. That way you can ALWAYS get access to the code.
Though it is possible to move instances between zones, I have created a new project in different location and will use whichever has GPU resource available.
Let me know if this helps your case.
EDIT: I ran into environment related issues. When i tried again the fastai-shell commands worked (earlier they didn’t). This is better option as it allows you to move the instance to any zone where GPU is available.
I lost my internet connection at one point and when it was back up I could not reconnect to jupyter without ssh’ing in again and starting all over by executing my notebook cells to where I was up to. When running tmux it protects you from that and any running processes on the server would still be running - so my jupyter notebooks would still work in the browser.
Reading up on Reptyr now to see how it works and can help
I’m a little confused here.
In my experience when you drop the session the jupyter server should still be running and your notebook along with it.
The only issue I have been having is that you lose any output of currently running cells.
Am I wrong in thinking those cells are still running? I’ve never waited them out, i’m too impatient.
When I spin up my instance again I will explicitly test this out… When it happened to me yesterday I simply could not get the notebook to connect again without running the gcloud ssh command again. Is it because when I do ssh I’m mapping the ports to locahost… thats probbaly why.
It happens. You did nothing wrong. It just happens some time. GCP probably has a more limited number of GPUs (need verification) compared to AWS. So when too many people start their instances, they run out of GPUs.
I would recommend having multiple instances set up in different zones. A simple way to go would be creating a snapshot of the disk of your current instance, then create same kinds of instances using that in different zones. When the resource is in demand, simply try all of them.
GCP is great in terms of UX and pricing, but in some sense it under-prices its instances and, as a result, they are too popular and sold out. That is the price we have to pay for it.