Platform: GCP ✅

gamo · January 9, 2019, 11:35am

If you install the google cloud console app you will have access to all instance settings and be able to start an ssh session, the start ssh option is hidden in menus but it is there.

You will not be able to start a jupyter session in a browser through that ssh console though.

dhoa · January 9, 2019, 11:49am

Thanks. It works. So now I can stop the instance easier and save my budget

Alex4 · January 26, 2019, 1:59pm

Hey all … please help … I am from Europe and get the error below in Step 3 … I basically just replaced the ZONE value with the region associated with my country.

ERROR: (gcloud.compute.instances.create) Could not fetch resource: - Invalid value ‘“europe-west4-a”’. Values must match the following regular expression: ‘[a-z] (?:[-a-z0-9]{0,61}[a-z0-9])?’

Best Regards, Alex

szelee · January 27, 2019, 9:26am

I’ve just encountered the similar problem as well. You’ll need to refer to the GPUs for compute workloads table at https://cloud.google.com/compute/docs/gpus/ to find the available region for your required gpu.

For your case, P4 is not available in europe-west4-a but available in:

europe-west4-b
europe-west4-c

while K80 is avaible in:

europe-west1-b
europe-west1-d

Hope this helps.

Alex4 · January 27, 2019, 9:48am

Thanks for the Response! That makes sense! I managed to get it run with the standard us location. seems it does not have to be my region!

cheers, Alex

jankelowitz · January 28, 2019, 3:16pm

Anyone come across a good tutorial / guide on setting up Jupyter checkpoints for preemptible instances? Many thanks in advance.

glot · January 30, 2019, 8:12pm

After creating an account, there is a project “My First Project”; and it appears that with gcloud init a project must be selected, so this is selected.
In step 3 “Create an instance” of the tutorial, after entering all the gcloud compute instances create $INSTANCE_NAME … info, the prompt is:

API [compute.googleapis.com] not enabled on project [836105925636]. 
Would you like to enable and retry (this will take a few minutes)?
(y/N)?

It seems strange that all of this FastAI configuration for an instance would be enabled for the default project “My First Project”. Should I:

just click y
create a new, say, “FastAI” project and then enter gcloud compute instances create $INSTANCE_NAME …
do something else (what?)

adpostma · February 1, 2019, 12:28pm

I had the same problem in nov 2018 (see post above in this discussion) After upgrading my account I could connect to my instance again. Had no problem since.

hud · February 1, 2019, 9:18pm

Hello

sorry for this naive question but where is the location of the fastai library within the gcp setup ?etc.

I followed the setup exactly according to this: https://course.fast.ai/update_gcp.html#update-the-fastai-library

my fastai library is also up-to-date. I just want to look at what’s behind the scenes and check what has changed etc.

Hope this makes sense!

phithor · February 1, 2019, 9:44pm

If you want to see the source code, the best should be to look at the Github repo.

ringoo · February 2, 2019, 12:47am

The MNIST dataset from lesson 1 takes unusually long on my instance. I was used to 10 seconds per epoch (with another tutorial on a GTX970) bot now it is 3 minutes with the P4.

I wonder if I made a mistake when setting up my GCP fastai instance.
The installation worked without errors with the command:

>     export IMAGE_FAMILY="pytorch-latest-gpu" # or "pytorch-latest-cpu" for non-GPU instances
>     export ZONE="europe-west4-b" # budget: "us-west1-b"
>     export INSTANCE_NAME="my-fastai-instance"
>     export INSTANCE_TYPE="n1-highmem-8" # budget: "n1-highmem-4"
> 
>     # budget: 'type=nvidia-tesla-k80,count=1'
>     gcloud compute instances create $INSTANCE_NAME \
>             --zone=$ZONE \
>             --image-family=$IMAGE_FAMILY \
>             --image-project=deeplearning-platform-release \
>             --maintenance-policy=TERMINATE \
>             --accelerator="type=nvidia-tesla-p4,count=1" \
>             --machine-type=$INSTANCE_TYPE \
>             --boot-disk-size=200GB \
>             --metadata="install-nvidia-driver=True" \
>             --preemptible

during learning, the cpu is usage is 800% and there is no nvidia-smi installed on the instance. should this make me worry?
How can we check if the GPU is busy?

ringoo · February 2, 2019, 7:23am

after restarting the instance, now the nvidia-smi command is available and training goes much faster!

amqdn · February 2, 2019, 9:37pm

I ran find -name fastai at the terminal and found ./opt/anaconda3/src/fastai.

amqdn · February 2, 2019, 9:51pm

You may have already figured this out, but in case you didn’t:

The issue you’re encountering is not a fastai thing, it’s a GCP thing. When first working on GCP, certain resources, like the compute engine, need to be enabled. Once it is, you’ll have it available to use for any of your projects.

kwong · February 5, 2019, 3:32am

I set up my google cloud instance as described in the course guide, and i received an email from google 2 days ago saying my gpu increase had been approved (indeed it says my limit is now 1). however, when i try running this command: gcloud compute ssh --zone=$ZONE jupyter@$INSTANCE_NAME – -L 8080:localhost:8080 I get ERROR: (gcloud.compute.ssh) Instance [my-fastai-instance] in zone [us-west2-b] has not been allocated an external IP address yet. Try rerunning this command later.

The guide didn’t say anything about setting an external IP address (though i see there is some documentation from google on reserving a static IP address and it says one should automatically be assigned when you create a new instance). anyone seen this before?

kwong · February 5, 2019, 12:39pm

update - i added an external ip address per the instructions in this article, but now I am getting error code 225 when i try to run the ssh command: ERROR: (gcloud.compute.ssh) [/usr/bin/ssh] exited with return code [255].

I decided to just create a new instance and now everything seems to be working fine.

nikola · February 5, 2019, 4:02pm

Hi all, I’m trying to install nbextension - collapsible headings but I get this error:

[Errno 13] Permission denied: '/opt/anaconda3/lib/python3.7/site-packages/conda-4.6.2-py3.7.egg-info/PKG-INFO' -> '/opt/anaconda3/lib/python3.7/site-packages/conda-4.6.2-py3.7.egg-info/PKG-INFO.c~'

This is the command I’m running is conda install -c conda-forge jupyter_nbextensions_configurator
from installation instructions page https://github.com/Jupyter-contrib/jupyter_nbextensions_configurator#installation.

Anyone managed to install any nbextension on GCP?

satyaveera · February 6, 2019, 7:05pm

got same error but running azure DSVM.

le_mack · February 7, 2019, 3:07pm

Thanks for the tip here on git stash command. cheers

cmbest · February 10, 2019, 5:59pm

Hello~ Have you solved this problem then? I try to add the ssh key in the instance but the format of the key seems to be wrong for the instance:(. Could you share with me the method to fix this if you have solved this problem.Big thanks in advance