Google Cloud Platform

I have my data stored in google drive, how can I access this file in GCP?

[EDIT] I tried wget

This downloads the data to the VM, however, I didn’t want to DOWNLOAD the data but to simply access the google drive via VM.

So I have even more basic question.
I am running windows 7.
I am running my google compute engine and Jupyter hub in a virtual machine, ubuntu.

When I download images, first lesson, let’s say this image :
/home/jupyter/.fastai/data/oxford-iiit-pet/images/wheaten_terrier_36.jpg’


#1
Where is it downloaded to?
Where can I find it?

Is it on my local virtual machine / ubuntu?
Is it on my google compute engine instance? If so, how do I get access to them there?

When I go to Jupyter hub i start with tutorials/fastai/nbs/dl1…
Can’t find images there…Or in any other folder there for that matter…


#2
Another question is, when I try to play with my own dataset…
Where do I ingest files so that I can access them? To my Jupyter Hub?

1 Like

I’m new here as well, but let me take a shot at answering your questions.

#1
Those images are downloaded to the Google Compute Engine instance. When you ran gcloud compute ssh --zone=$ZONE jupyter@$INSTANCE_NAME -- -L 8080:localhost:8080 you were logged into that GCP instance. Whatever you do on the Jupyter notebooks will only affect that instance and not your local system.

The images reside in the /home/jupyter/.fastai/data folder of your GCP instance. You won’t see these folders in Jupyter as they’re hidden, but you can now use the terminal window we used to run the above command to explore these files with usual linux commands:

cd /home/jupyter/.fastai/data
ls -la

#2
You can find the way to build your own dataset on the lesson2-download notebook. If you already have the dataset on your local machine, you can just upload it to the Jupyter hub as well.

thank you very much for your prompt and straight forward reply.

Follow up to #2.
So if I have my dataset on my virtual machine. It’s over 6,000 images. Dragging and dropping into Jupyter hub prompts me to click “upload” on every file… So clicking 6,000 + times doesn’t look efficient. So there has to be a better way.

Googled around and found (https://stackoverflow.com/questions/34734714/ipython-jupyter-uploading-folder) other people having the same issue with (official github.com/jupyter isssue) resolution being:

"Convert it into a single Zip file and upload that. to unzip the folder use the code down bellow

import zipfile as zf
files = zf.ZipFile("ZippedFolder.zip", 'r')
files.extractall('directory to extract')
files.close()

"

That’s great, but typing this in my GCE Jupyter Hub:

import os
import zipfile as zf

os.getcwd()
os.chdir('/')

ls

reveals following GCE folders…

  • bin/
  • home/
  • lib64/
  • opt/
  • sbin/
  • usr/
  • boot/
  • initrd.img@
  • lost+found/
  • proc/
  • srv/
  • var/
  • dev/
  • initrd.img.old@
  • media/
  • root/
  • sys/
  • vmlinuz@
  • etc/
  • lib/
  • mnt/
  • run/
  • tmp/
  • vmlinuz.old@

And I am not really sure if there is a way to navigate to my VM desktop.

Are there any solutions vs what I described and/or is there a way to navigate from GCE to VM desktop in Jupyter hub, or better yet, is there a way to just store these images in Google Cloud Storage and then pull them from there?

Okay so here is one way:

Under google cloud platform main menu go to storage.
Create storage bucket.
Put a test csv file in.

Then, go to your compute engine instances.
Find your fast AI instance, under “connect” column — click SSH.

In resulting window follow (most) instructions starting @1:20 outlined in the video here (but read additional points below before that):

You will have to install GCSfuse to connect your bucket to your virtual machine:

After this script above, authorize access so that you dont get security/ bucket access error by running the following:

gcloud auth application-default login
(from https://esc.sh/blog/mount-gcs-bucket-linux/)

Then follow instructions and authorize yourself.
Copy link code, etc etc etc.

When video instructions make it to this script:
gcsfuse YOURBUCKETNAMEHERE mnt/gcs-bucket

Change it to:
/usr/bin/gcsfuse BUCKETNAMEHERE /mnt/gcs-bucket

Why?
Follow this thread

if
/usr/bin/gcfuse

doesn’t work you can find where your gcsfuse instance got installed by running the following command:
whereis gcsfuse

finish the video and that’s it, should be good to go.
Go to jupyterlab.
Create notebook.
Run your checks:

import pandas as pd
import numpy as np
data = pd.read_csv(“gs://YOURBUCKETNAMEHERE/YOURTESTFIEL.csv”)
data.head(5)

Still wondering if there is a way to read stuff from my desktop in VMs JupyterHub…

1 Like

I don’t think there’s a trivial way to read files in your desktop directly from your VM. You’d either have to copy it over to some cloud location like you’ve just done, or you can use the scp command to transfer the file directly to the VM.

Instructions to do this on GCP is here. Check the examples section. Its pretty much the same as the ssh command, and you can transfer whole folders without making them into zip files.

Nice! Thanks! I will give it a try as well. Not sure how fast google cloud storage is vs having files directly on VM.

Update, just found another way, even simpler, but one file at a time … : (
Same as before, click SSH under google cloud platform.
Click on gear looking icon, right hand corner. There is an option to upload a file…

Taken from here

ended up doing this:

I couldn’t really find a way to really use fast.ai’s databunch with google cloud storage

I am trying to deploy the bear model on google cloud platform. but i am keep getting this error complaining “can not allocate memory”. Have anybody run into the same problem and have some insight on this:

Step #1 - “builder”: INFO tar_runtime_package took 23 seconds
Step #1 - “builder”: INFO starting: gzip_tar_runtime_package
Step #1 - “builder”: INFO gzip_tar_runtime_package gzip /tmp/tmpuJcM44.tar -1
Step #1 - “builder”: INFO gzip_tar_runtime_package took 0 seconds
Step #1 - “builder”: INFO building_python_pkg_layer took 29 seconds
Step #1 - “builder”: INFO uploading_all_package_layers took 34 seconds
Step #1 - “builder”: INFO build process for FTL image took 258 seconds
Step #1 - “builder”: INFO full build took 258 seconds
Step #1 - “builder”: ERROR gzip_tar_runtime_package gzip /tmp/tmpuJcM44.tar -1
Step #1 - “builder”: exited with error [Errno 12] Cannot allocate memory
Step #1 - “builder”: gzip_tar_runtime_package is likely not on the path
Step #1 - “builder”: Traceback (most recent call last):
Step #1 - “builder”: File “/usr/lib/python2.7/runpy.py”, line 174, in _run_module_as_main
Step #1 - “builder”: “main”, fname, loader, pkg_name)
Step #1 - “builder”: File “/usr/lib/python2.7/runpy.py”, line 72, in _run_code
Step #1 - “builder”: exec code in run_globals
Step #1 - “builder”: File “/usr/local/bin/ftl.par/main.py”, line 65, in <module>
Step #1 - “builder”: File “/usr/local/bin/ftl.par/main.py”, line 60, in main
Step #1 - “builder”: File “/usr/local/bin/ftl.par/main/ftl/common/ftl_error.py”, line 77, in InternalErrorHandler
Step #1 - “builder”: IOError: [Errno 2] No such file or directory: ‘""/output’ Finished
Step #1 - “builder” ERROR ERROR: build step 1 “gcr.io/gae-runtimes/python37_app_builder:python37_20190527_3_7_3_RC00” failed: exit status 1

Hello,

Does anybody know what is the best zone to choose for google cloud platform.
I’m from the Netherlands and picked the Amsterdam zone, that sounded the most logical to me.
I got everything working and started the first class, but I can’t start the VM half of the times because there are no resources available.
Are there differences in availability between zones and is is maybe handier to use the US-west zone as stated in the set up guide?

I used europe-west4-b, that worked for me.
However, a lot of the times there are not enough resources available and I can’t start my instance.

1 Like

use the normal instance, not the pre-emptied. It’s much more expensive, but you have the $300 gcp credit. You connect with only one try and doesn’t disconnect


Hi, I really met a problem, and I don’t know to process…
I used the “gcloud compute ssh –zone=us-west1-b jupyter@my-fastai-instance – -L 8080:localhost:8080”, but it said no jupyter can be recognized, can you help me?

Are you using an instance you created following the steps here? Can you give me a bit more info on your setup? Thanks.

Yes, I followed the steps as the https://course.fast.ai/start_gcp.html. All goes well until the last step. I can’t use the juypter notebook. It cannot be connected. I don’t know why. So I tried many other ways, like https://github.com/arunoda/fastai-shell. It’s very easy, but I met the same problem, just as shown in the picture. Would you mind to communicate with email? My email is caiwangzheng@gmail, I have tried to use google cloud platform to perform fastai for two weeks… I don’t know how to solve it… Really thank you for your reply…

1 Like

I’m sorry, this area isn’t my strength so I may not be the best person to help. Looking very closely at the commands in the first screenshot, it looks like you used only one dash -zone=us-west1-b instead of two dashes --zone=us-west-1b. When I tried it I got a similar error message to yours about unrecognized arguments. Hopefully that helps.

@MadeUpMasters @Timothy_ZHENG, it looks like I’m running into a similar issue to you. I’m looking to launch a jupyter notebook that is connected to a Google Cloud instance. I have followed all the steps in https://course.fast.ai/start_gcp.html, but I keep on getting an error when I issue the gcloud compute ssh command. Any idea what I’m missing here? Thanks!

Error Message:
ssh: connect to host 35.233.250.67 port 22: Resource temporarily unavailable
ERROR: (gcloud.compute.ssh) [/usr/bin/ssh] exited with return code [255].

export IMAGE_FAMILY="pytorch-latest-gpu"
export ZONE="us-west1-b"
export INSTANCE_NAME="fastai-instance"
export INSTANCE_TYPE="n1-highmem-4" 

gcloud compute instances create $INSTANCE_NAME \
        --zone=$ZONE \
        --image-family=$IMAGE_FAMILY \
        --image-project=deeplearning-platform-release \
        --maintenance-policy=TERMINATE \
        --accelerator="type=nvidia-tesla-k80,count=1" \
        --machine-type=$INSTANCE_TYPE \
        --boot-disk-size=200GB \
        --metadata="install-nvidia-driver=True" \
        --preemptible

export ZONE="us-west1-b"
export INSTANCE_NAME="fastai-instance"
gcloud compute ssh --zone=$ZONE jupyter@$INSTANCE_NAME -- -L 8080:localhost:8080

I think this means there are no premptible instances available. You can try again later, that might work

I am sorry but I haven’t solved this problem. I am struggling with it. We can communicate if I solve it