Platform: GCP ✅


(Dien Hoa TRUONG) #500

Another question for GCP. Can we ssh to an instance by our phone :smiley: ? I found there is an application Cloud Console that I can stop the instance. But I also need to access to it to see if the training is completed. Sometime I need to turn off the local machine and want to check the instance by just my phone. Thank you in advance


(Gabriel) #501

If you install the google cloud console app you will have access to all instance settings and be able to start an ssh session, the start ssh option is hidden in menus but it is there.

You will not be able to start a jupyter session in a browser through that ssh console though.


(Dien Hoa TRUONG) #502

Thanks. It works. So now I can stop the instance easier and save my budget :smiley:


(Alexander Zehetmaier) #503

Hey all … please help … I am from Europe and get the error below in Step 3 … I basically just replaced the ZONE value with the region associated with my country.

ERROR: (gcloud.compute.instances.create) Could not fetch resource: - Invalid value ‘“europe-west4-a”’. Values must match the following regular expression: ‘[a-z] (?:[-a-z0-9]{0,61}[a-z0-9])?’

Best Regards, Alex


#504

I’ve just encountered the similar problem as well. You’ll need to refer to the GPUs for compute workloads table at https://cloud.google.com/compute/docs/gpus/ to find the available region for your required gpu.

For your case, P4 is not available in europe-west4-a but available in:

  • europe-west4-b
  • europe-west4-c

while K80 is avaible in:

  • europe-west1-b
  • europe-west1-d

Hope this helps.


(Alexander Zehetmaier) #505

Thanks for the Response! That makes sense! I managed to get it run with the standard us location. seems it does not have to be my region!

cheers, Alex :slight_smile:


(Ari Jankelowitz) #506

Anyone come across a good tutorial / guide on setting up Jupyter checkpoints for preemptible instances? Many thanks in advance.


#508

After creating an account, there is a project “My First Project”; and it appears that with gcloud init a project must be selected, so this is selected.
In step 3 “Create an instance” of the tutorial, after entering all the gcloud compute instances create $INSTANCE_NAME … info, the prompt is:

API [compute.googleapis.com] not enabled on project [836105925636]. 
Would you like to enable and retry (this will take a few minutes)?
(y/N)?

It seems strange that all of this FastAI configuration for an instance would be enabled for the default project “My First Project”. Should I:

  • just click y
  • create a new, say, “FastAI” project and then enter gcloud compute instances create $INSTANCE_NAME …
  • do something else (what?)

(Ad Postma) #509

I had the same problem in nov 2018 (see post above in this discussion) After upgrading my account I could connect to my instance again. Had no problem since.


#510

Hello

sorry for this naive question but where is the location of the fastai library within the gcp setup ?etc.

I followed the setup exactly according to this: https://course.fast.ai/update_gcp.html#update-the-fastai-library

my fastai library is also up-to-date. I just want to look at what’s behind the scenes and check what has changed etc.

Hope this makes sense!


(Mathias Thorsen) #511

If you want to see the source code, the best should be to look at the Github repo.


(Michael) #513

The MNIST dataset from lesson 1 takes unusually long on my instance. I was used to 10 seconds per epoch (with another tutorial on a GTX970) bot now it is 3 minutes with the P4.

I wonder if I made a mistake when setting up my GCP fastai instance.
The installation worked without errors with the command:

>     export IMAGE_FAMILY="pytorch-latest-gpu" # or "pytorch-latest-cpu" for non-GPU instances
>     export ZONE="europe-west4-b" # budget: "us-west1-b"
>     export INSTANCE_NAME="my-fastai-instance"
>     export INSTANCE_TYPE="n1-highmem-8" # budget: "n1-highmem-4"
> 
>     # budget: 'type=nvidia-tesla-k80,count=1'
>     gcloud compute instances create $INSTANCE_NAME \
>             --zone=$ZONE \
>             --image-family=$IMAGE_FAMILY \
>             --image-project=deeplearning-platform-release \
>             --maintenance-policy=TERMINATE \
>             --accelerator="type=nvidia-tesla-p4,count=1" \
>             --machine-type=$INSTANCE_TYPE \
>             --boot-disk-size=200GB \
>             --metadata="install-nvidia-driver=True" \
>             --preemptible

during learning, the cpu is usage is 800% and there is no nvidia-smi installed on the instance. should this make me worry?
How can we check if the GPU is busy?


(Michael) #514

after restarting the instance, now the nvidia-smi command is available and training goes much faster!


(Andrew Nguyen) #515

I ran find -name fastai at the terminal and found ./opt/anaconda3/src/fastai.


(Andrew Nguyen) #516

You may have already figured this out, but in case you didn’t:

The issue you’re encountering is not a fastai thing, it’s a GCP thing. When first working on GCP, certain resources, like the compute engine, need to be enabled. Once it is, you’ll have it available to use for any of your projects.


(Kathryn) #517

I set up my google cloud instance as described in the course guide, and i received an email from google 2 days ago saying my gpu increase had been approved (indeed it says my limit is now 1). however, when i try running this command: gcloud compute ssh --zone=$ZONE jupyter@$INSTANCE_NAME – -L 8080:localhost:8080 I get ERROR: (gcloud.compute.ssh) Instance [my-fastai-instance] in zone [us-west2-b] has not been allocated an external IP address yet. Try rerunning this command later.

The guide didn’t say anything about setting an external IP address (though i see there is some documentation from google on reserving a static IP address and it says one should automatically be assigned when you create a new instance). anyone seen this before?


(Kathryn) #518

update - i added an external ip address per the instructions in this article, but now I am getting error code 225 when i try to run the ssh command: ERROR: (gcloud.compute.ssh) [/usr/bin/ssh] exited with return code [255].

I decided to just create a new instance and now everything seems to be working fine.


#519

Hi all, I’m trying to install nbextension - collapsible headings but I get this error:

[Errno 13] Permission denied: '/opt/anaconda3/lib/python3.7/site-packages/conda-4.6.2-py3.7.egg-info/PKG-INFO' -> '/opt/anaconda3/lib/python3.7/site-packages/conda-4.6.2-py3.7.egg-info/PKG-INFO.c~'

This is the command I’m running is conda install -c conda-forge jupyter_nbextensions_configurator
from installation instructions page https://github.com/Jupyter-contrib/jupyter_nbextensions_configurator#installation.

Anyone managed to install any nbextension on GCP?


(satyaveera) #520

got same error but running azure DSVM.:frowning_face:


(Lee) #521

Thanks for the tip here on git stash command. cheers