How to set up env using Google Cloud?

MLNewbie · May 16, 2017, 11:30pm

Thanks. I looked into cs231n and https://haroldsoh.com/2016/04/28/set-up-anaconda-ipython-tensorflow-julia-on-a-google-compute-engine-vm/ and got conda and tensorflow installed. I have Jupyter up and running as well now. I checked the link eshvk posted as well, but it had few scripts and I was not suere how/where to run them.

Hope I have everything I need now. you were awesome and thanks for quick responses.

MLNewbie · May 21, 2017, 2:25pm

Hi - I was trying with google cloud with 2 vCpu (13GB memory, high memory version as well), but it takes so long (I waited for 15 mins and still sample folder from lesson1 does not complete). Can you pls advice, from your findings, on what is the min configuration for getting sample data to work and min configuration for actual data (ex: Lesson1 data) to work?

I was trying to see what configurations to use, to optimize the cost incurred. Thanks in advice.

sebastian · May 21, 2017, 2:50pm

Did you attach an GPU? Have you checked if Theano is using a GPU?

I ran the first lessons on AWS, but the execution times should be similar on GCE, if you have everything set up properly.

sebastian · May 21, 2017, 3:12pm

By the way, please don’t take this the wrong way, but have you read http://wiki.fast.ai/index.php/How_to_ask_for_Help

I’d like to help, but please try to solve a problem by yourself first, and give enough information on what you have tried, what the possible cause might be, et cetera, before asking for help. And do a search first, here on the forums, but also a Google search, to see if you can find an answer to your question yourself.

MLNewbie · May 21, 2017, 4:54pm

Thanks. I tried all combinations without enabling GPU (as per post http://cs231n.github.io/gce-tutorial/), but it did not work. Since you were able to setup with GC, thought of seeking your help for suggestions. Since enabling GPU increases the cost, and I know I will make lot of mistakes during learning, wanted to see how I can maximise my hours available for ML and hence if it can be done without enabling GPU. Sorry for being a pain. You have been of great help and thanks a bunch.

sebastian · May 21, 2017, 5:13pm

You need GPU’s for this course, this is mentioned by Jeremy in the first lesson, and it is also explained in the notes of the first lesson. You can do a part of the first lesson on a instance similar to t2.large, but it will take a long time.

Again, I love to help, but please watch the video and read the notes first.

eshvk · May 24, 2017, 5:18am

wanted to see how I can maximise my hours available for ML

Dropping by here with a quick anecdotal note. I ran dogs vs cats enhanced on a single K80 GPU based instance, it took like maybe six minutes per training epoch.

I am not convinced the cost vs time benefit of using a CPU are that much more cheaper. Esp. for these lessons.

sebastian · May 29, 2017, 11:11am

Just in case anyone is still having issues getting the things running on Google Cloud, here are my steps for creating a DL instance from scratch. I’m using the script from https://github.com/fastai/courses. This uses Python 2.7, and Keras 1.2.2. I recommend using this config (instead of Python3 and Keras2) unless you know what you are doing.

STEP 1
I assume you already know how to create an instance on Google Cloud. See https://cloud.google.com/compute/docs/instances/create-start-instance if you don’t.

Create a n1-standard-1 instance with Ubuntu 16.04, a single GPU and a bootdisk of 20GB. Create the instance in a zone where GPU’s are available, see https://cloud.google.com/compute/docs/gpus/. You can use a different instance with more CPU or memory, if you want. Give the instance a network tag jupyter. We need this to create a firewall rule later.

Optionally, create and attach a persistent data disk. This is not required for the lessons, but it can be useful if you want to keep data or models when deleting the instance. I named the instance “deeplearning”, so this becomes the name of the boot disk too and I named the data-disk “deeplearning-data”.

STEP 2
Ssh into the instance.

STEP 3
Download the script that installs CUDA, Anaconda etc:
wget https://raw.githubusercontent.com/fastai/courses/master/setup/install-gpu.sh

STEP 4
Run the script:
sudo sh install-gpu.sh
At the end you need to pick a password for the jupyter notebook. This script also clones the course materials from https://github.com/fastai/courses/.

STEP 5
reboot, either using the reboot command or the reset option on the console:
sudo reboot

STEP 6
Create a firewall rule for accessing port 8888 from your local machine, using the console or the command line:

export PROJECT="project_name"
export YOUR_IP="enter_the_ip_of_your_local_machine"
gcloud beta compute --project "${PROJECT}" firewall-rules create "jupyter" --allow tcp:8888 --direction "INGRESS" --priority "1000" --network "default" --source-ranges "${YOUR_IP}" --target-tags "jupyter"

STEP 6
When the instance has restarted, ssh into the instance again and check if CUDA is installed properly:

sudo modprobe nvidia
nvidia-smi

STEP 7
Run jupyter notebook:
jupyter notebook --ip=0.0.0.0 --port=8888
Note the token displayed in the terminal. Go to the notebook using the external IP of your instance.
At this point everything is set up for doing the lessons.

STEP 8 (optional)
Format and mount the data-disk with the following commands. This mounts the data disk to the /opt/my_data directory. Feel free to use a different location.

export MOUNT_DIR=/opt/my_data
# ${hostname} is the name of the instance
export DISK_NAME=${hostname}-data
export DISK_MOUNT_POINT=/dev/disk/by-id/google-${DISK_NAME}

sudo mkdir -p ${MOUNT_DIR}
sudo chmod 777 ${MOUNT_DIR}

# uncomment next line to format disk, don't do this if your data-disk already has data
# mkfs.ext4 -F ${DISK_MOUNT_POINT}
sudo mount -o discard,defaults ${DISK_MOUNT_POINT} ${MOUNT_DIR}

backup fstab
sudo cp /etc/fstab /etc/fstab_old

# change fstab so the disk is mounted after a reboot
printf "${DISK_MOUNT_POINT} ${MOUNT_DIR} ext4 defaults 0 0\n" | sudo tee -a /etc/fstab

brookisme · May 29, 2017, 8:22pm

This is my setup:

gist.github.com

https://gist.github.com/brookisme/c16b3e34c7741c8261a978829eea566a

gcloud-gpu-setup.sh

#!/bin/bash
cd ~/

### CUDA
echo "\n\nChecking for CUDA and installing."
if ! dpkg-query -W cuda; then
  curl -O http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/cuda-repo-ubuntu1604_8.0.61-1_amd64.deb
  sudo dpkg -i ./cuda-repo-ubuntu1604_8.0.61-1_amd64.deb
  sudo apt-get update
  sudo apt-get install cuda -y

This file has been truncated. show original

gcloud-gpu.md

## GCLOUD GPU 

_The number of virtual CPUs and the storage have little effect on cost. However prices increase dramatically with GPUs_

- n1-standard-8: 8 virtual CPUs and 30 GB of memory.
- count=4,2,1: # of GPUs 
- boot-disk-size: 200GB

###### NOTE ON SNAPSHOTS
If creating from a snapshot you must first  create the disk

This file has been truncated. show original

Theres as setup script you need to upload to your instance and a readme on how to do the rest

brookisme · May 29, 2017, 11:55pm

Someone mentioned I needed to add the pip installation - just updated the markdown doc

sandskies · June 2, 2017, 5:30am

Please add

#sudo apt-get install git

also to the script

sandskies · June 2, 2017, 7:10am

When I try to install gpu in my google cloud platform using the script, it fails at step

$sudo apt-get -y install cuda

and error message is

$ sudo apt-get -y install cuda
Reading package lists… Done
Building dependency tree
Reading state information… Done
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:

The following packages have unmet dependencies:
cuda : Depends: cuda-8-0 (>= 8.0.61) but it is not going to be installed
E: Unable to correct problems, you have held broken packages.

mele · June 5, 2017, 5:53pm

Thank you sebastian for your instruction.
I finished up to step 7, but I can’t access jupyter notebok(address :external IP:8888?tokenXXXXXXXXX).
Error messages are access denied or too long to respond.
I suspect it’s the problem of setting firewall rules.
I used your step6 firewall rule and then tried another (such like --source-ranges 0.0.0.0/0 )
How can I fix it?

sebastian · June 5, 2017, 6:14pm

Did you start Jupyter with:
jupyter notebook --ip=0.0.0.0 --port=8888?

The firewall settings should look similar to this:

The source IP range should have the IP address of your local machine. (So not 0.0.0.0).

And finally, did you add the tag jupyter to your instance?

mele · June 5, 2017, 6:27pm

Thank you very much!!
I missed adding the tag to my instance. Now I can access my jupyter notebook.
I really appreciate your replay.

sebastian · June 5, 2017, 6:57pm

No problem, glad you got it working!

justaboutnormal · June 6, 2017, 10:28pm

I’ve been following along to get my instance running too. I’m wondering how to setup the network rules so that I don’t have to change the rule every time my ISP changes my IP address.

sebastian · June 6, 2017, 10:45pm

Maybe create a bash script that changes the firewall settings automatically, before you start jupyter?

Or you’ll have to look at https://jupyterhub.readthedocs.io/en/latest/ for securing jupyter itself, instead of relying on firewall settings for security.

brookisme · June 7, 2017, 4:57am

Hey Everyone. I moved my gist over to full repo and made a few changes:

In the setup script I added the pip installation, fixed a bug or two and included the conda py2 environment. I then added a create_instance script. So at this point the main setup comes down to this…

# local env
$ . create_instance.sh gpu-84 4
$ gcloud compute copy-files gpu-setup.sh gpu-84:~/

# remote instance
# Note: this sets up a Py3 environment with Keras 2.  It also creates Py2 enviroment with Keras 1 `source activate py2`.
$ . gpu-setup.sh

There are a couple things to cut and paste from the readme to get CUDNN installed but it runs fast. The readme also has some checks to run, info on jupyter pwd, syncing sublime.

peaky · June 12, 2017, 7:13pm

Hi @sebastian,

Thank you for the awesome steps to work with GCP. I have followed your steps and stuck in the last step of connecting to the jupyter notebook. I am getting the error- “104.198.15.91 took too long to respond” in the browser.As you suggested in your another post, i have checked the firewall rules and they are also as expected.
Any idea what might be going wrong here?
below is the snapshot of firewall rule: