Platform: Google Cloud Platform (GCP)

It works like a charm. Thank you.

OMG. I finally did it after a lot of fiddling. I hadn’t noticed the pre-emtible error (wasn’t that obvious on the console) and so the reason I couldn’t ssh in was because the instance wasn’t running. So obvious now. And whilst trouble shooting I’d switched from using my native terminal to the CLI in Google Cloud. For some reason starting a Jupyter Notebook from CLI doesn’t work (localhost:8080 doesn’t connect). Finally tried from the native terminal and it all worked.

Summary: follow the instructions except for setting up a pre-emptible instance.

At least I’ve learnt quite a lot about the gcloud commands and Google Cloud in general whilst fiddling around with this.

1 Like

Last year I had lots of issues with my instances constantly getting preempted. It probably doesn’t help that everybody in this course is using --zone=us-west1-b so it might be useful to spread out to other zones (but make sure to check GPU availability in each zone and zone-specific GPU pricing).

2 Likes

I’m sure others have already thought of this, but I have forgotten to stop my VM instance twice since setting it up two days ago. I’ve written a (one line) bash script to stop the instance and have scheduled it to run with cron at 2am each day. That means I can only get billed a maximum of 20 hours if I forget to stop it.

I’ve also created a couple of aliases so that I don’t have to type long commands in the terminal to start the process and the tunnel. So instead of:

gcloud compute instances start alex-fastai-instance --zone=us-west1-b
gcloud compute ssh --zone=us-west1-b jupyter@alex-fastai-instance -- -L 8080:localhost:8080

I just type the following into the terminal to get started with the Jupyter notebook.

startFAI
GCssh

Hope this might help somebody save a few key strokes or some billable hours.

2 Likes

I have just published a setup guide for Amazon SageMaker.

2 Likes

So putting that all together (retrospectively, hope I’m not missing anything important), I have:

  • Step 1: Creating your account

  • Step 2: Install Google CLI

  • Step 3: Create an instance (I didn’t use the “–preemptible” option)

  • Step 4:
    In the command line clone the course-v4 repository with:
    git clone https://github.com/fastai/course-v4
    and the fastbook repository with
    git clone https://github.com/fastai/fastbook.git
    Note: A git pull within the course-v4 folder or the ‘fastbook’ folder keeps them up to date.
    Also don’t forget to install fastai with:
    pip install fastai2 in the command line and its dependencies with:
    pip install -r requirements.txt after navigating to the course-v4 folder.

  • Step 5: Stop an instance

11 Likes

Trying to clone the latest course repository on gcloud does this. Anyone seen a similar issue ?

Did you try using https to clone the repository instead of ssh? The command for that is in my post above.

1 Like

https worked bro. Thanks.

1 Like

I’m getting error “sudo: /opt/anaconda3/bin/conda: command not found” in response to

sudo /opt/anaconda3/bin/conda install -c fastai fastai

In step 4 from https://course.fast.ai/start_gcp.html.

Does anyone have a solution?

Also my remote host keeps closing my connection every 10 minutes or so.

E.g.

Which results in Jupyter Notebook kernel connection and autosave issues.

Any recommendations on how to fix this? Thanks!!

That was happening to me as well. You can run this to find preemption events:

gcloud compute operations list --filter="operationType=compute.instances.preempted"

Probably you’ll see that your instance was preempted at the time the connection was closed. I started over with a new, non-preemptable instance. To do so, drop --preemptible from the command you use to create an instance.

1 Like

@steef, I think you want to install fastai2 and not fastai, am I right? The command you are using will install fastai and not fastai2.

I recommend using pip like jeremy stated:
pip install fastai2

Here are all the steps in my post above:

Hope it’s useful, and that I have not left anything out. Do let me know, should you try it out.

2 Likes

Thanks! I deleted my instance and started over with a clear instance without --preemptiple which seems to work so far…

Yes! Actually your instructions above (earlier in this thread) are amazing; I followed those and everything is looking great so far :slight_smile: Thanks!

1 Like

I am unable to open this link to access the notebooks:
http://localhost:8080/tree.
I can see the instance running from the console. And am able to access them through the terminal too. Not sure how to fix this.
When I try to connect using this command -
gcloud compute ssh --zone=“us-west1-b” jupyter@“my-fastai-instance” – -L 8080:localhost:8080,
it connects but with an message that says “could not request local forwarding”


Can someone please help?

I dug around a little to find what local forwarding is.

  1. Landed here: https://manas.tungare.name/blog/ssh-port-forwarding-on-mac-os-x
  2. Used the ‘lsof -i -P’ command to find existing ‘jobs’ that had a ‘8080’ description (I really don’t know if any of this terminology is accurate).
  3. Used the 'kill -9 ’ to kill those jobs
  4. Reconnected using the 8080 port.
    And it works now.
    Sometimes, I feel like most of what I do is just hoping that something works out.

@jeremy FYI that N2D machines no longer support the west zone nor the p100 GPU. You might want to update your documentation. @rachel FYI too.

After reading some GCP docs I realized that N2D machines are in beta and they are no longer supported in the west zone + they no longer support the p100 GPU.

I got the following setup to work which has a little more memory than the recommended setup but has the same GPU as recommended.

export IMAGE_FAMILY="pytorch-latest-gpu" 
export ZONE="us-west1-b"
export INSTANCE_NAME="my-fastai-instance"
export INSTANCE_TYPE="n1-highmem-16" # It seems like the N2D machines are in beta and are no longer available in all zones + not working with p100 anymore.

gcloud compute instances create $INSTANCE_NAME \
        --zone=$ZONE \
        --image-family=$IMAGE_FAMILY \
        --image-project=deeplearning-platform-release \
        --maintenance-policy=TERMINATE \
        --accelerator="type=nvidia-tesla-p100,count=1" \
        --machine-type=$INSTANCE_TYPE \
        --boot-disk-size=200GB \
        --metadata="install-nvidia-driver=True" \
        #--preemptible # Don’t use preemptible as it gave me issues before; described in this thread too.
2 Likes

Hey, I recently moved to GCP from paperspace


I’m getting this error, though i opted for the passphrase, I have no clue what the problem is, I’m getting the key from the compute engine --> metadata --> SSH,
I have also tried getting it from the /Users/<sys_username>/.ssh/google_compute_engine.pub,
Need Help!

@steef I believe the OP wasn’t a wiki, I’ve fixed that. Could you please add these to the top post maybe under a new section?

You could also open a PR for the GCP setup instructions if you’d like.

Thank you!

2 Likes