Google Cloud Platform for fast.ai part1 v2

nok · November 30, 2017, 1:01pm

It happened that I am trying to use Google Cloud and saw @rachel Rachel’s blog post asking someone to write a blog post for this. So I decide to write a blog post to organise what I have done. After struggling for a few days, I finally can launch jupyter notebook successfully in GCP. Everyone has $300 free credit with google account.

Thanks to very useful comments of @sebastian, this post mainly re-organise his threads, I didn’t write a single line of code myself.

For details, please go to the blog post.

Reference:

nok · December 2, 2017, 6:13pm

Hi Seema, I cannot provide a comparison between AWS. But to me, Google Cloud looks a bit cleaner than AWS and your got $300 credit. Unless you are using a lot of computing power, I think $300 should be ok.

For 200GB SSD, 1 K80 GPU, 32 GB RAM, you can run it for about half a month, so should be ok as long as you don’t keep it running for weeks.

akshaykr · December 30, 2017, 8:31am

Hey nok, I read and followed your blog post. In last step when I launch jupyter notebook, it gives ERR_CONNECTION_TIMED_OUT error.
I have tried different ports, changed IP ranges in firewall rules(tried both 0.0.0.0 and my public ip) but just couldn’t get it working. Any thoughts on what I could be doing wrong.

nok · January 19, 2018, 1:04pm

just move this thread to v2.

Run curl http://files.fast.ai/setup/paperspace | bash, you will see it warn you some directory don’t exist, because we are not using fast.ai template in Paperspace.

Run wget http://files.fast.ai/setup/paperspace
Then vim paperspace delete that line of code

stas · May 17, 2018, 3:22am

Does anybody have a good recommendation for a speed-cost effective setup?

I started with n1-highmem-2 (2 vCPUs, 13 GB memory) and normal HD 30GB (free included) and it’s pretty slow.

I’m not yet sure where the bottleneck is - slow HD or not enough cores. Of course I could crank up the configuration but I’m trying to find the optimal setup to stretch the $$ for longer.

My surface analysis so far:

ram: plenty of free ram available - probably don’t need as much - wasting $$ here so far.
disk: I checked iotop - there is little io happening there most of the time, so the free HD should be just fine.
cpu: I suppose the next thing to try is to increase the number of CPU cores. the load on the machine is about 4-6 and see whether this significantly speeds things up.

this is based on running lessons 1 and 2 (from part 1).

any suggestions/recommendations?

nok · May 17, 2018, 7:23am

My recommendation is, use SSD, since it is cheap anyway, the real cost is GPU. In general I use a standard instance with 4CPU and 26GB RAM, since the cost is insignificant to GPU, it allows me to have 2 notebook running in parallel without running out of memory.

Also, you can change your configuration by just editing it because you fire up an instance.

Also, I use a Preemptible VM, which cost less than $0.2 for K80 GPU/hour, the downside is the instance is not always stable, they may shut down your instance whenever the resource is tight. But for just going through the notebook, it is totally fine. If you need to train more than few hours, using a normal instance is safer, or just make checkpoint

stas · May 17, 2018, 6:49pm

Thank you for your suggestions, nok.

I will need to experiment with different settings. Running 2+ jobs in parallel is definitely a resource saving factor as the GPU is only partially deployed in a simple lesson run.

yesterday, running this setup for 5h I got charged:

I shall try the pre-emptive next.