Platform: GCP ✅

AndreaPi · September 17, 2019, 7:57pm

Well, not exactly useless, but indeed they’ve been preempted quite frequently. I’ve been able to train models all the same, though. It probably helps that I recently started, so I’m still at lessons 1-2, which are probably less compute intensive than the others Anyway, I found that sometimes changin zone helps: like, us-west2-b is maybe the GCP zone with the highest demand right now?! You could follow my suggestions to find a zone which still has P4 GPUs, but not as much demand as us-west2-b

Alternatively, you could create a standard (non preemtible) instance, but of course that would be considerably more expensive.

leoauri · September 20, 2019, 9:13am

Hi, I’m just getting started.
I tried to follow the tutorial instructions, when I run the command to create an instance, I get an error saying:

ERROR: (gcloud.compute.instances.create) Could not fetch resource:
 - The resource 'projects/{...}/zones/us-west2-b/acceleratorTypes/nvidia-tesla-p100' was not found

I referred to Google GPUs on Compute Engine docs to find a zone providing nvidia-tesla-p100, changed the zone in the command, and it worked. Maybe the tutorial should be updated?

mrajaram · September 26, 2019, 3:52pm

I’d like to bounce this idea off of y’all about managing the GCP environment.

I’ve created a preemptible fastai instance based on the tutorials, and have been using it for the fastai notebooks and kaggle competitions. Lately, like other posters have mentioned, the connectivity for the preemptible instances has been frustrating.

Question: how easy is it to use the same persistent drive with both a preemptible and standard (persistent) instance?

Would I be able to point both instances at the same disk (maybe with an image?), or will I need to ‘detach’ this disk and ‘reattach’ it to whichever instance I am using at the moment?

Does anyone else have a workflow like this? (Different instances used with the same disk?)

Many thanks in advance

redexces · October 2, 2019, 4:38am

I’m a relative noob to linux, I have some limited fiddling in a prior work setting but nothing serious. In order to save GCP billing time, I’ve cloned the fastai v3 files and libraries to my local laptop to do practice operations which don’t involved training such as doing the image downloading and other stuff.

So I’ve used lesson2-download notebook to download a bunch of images from google and arrange them into folders as prescribed, but in my laptop. I then manually scrubbed the images to remove the obvious irrelevant and outlier images. I even ran the training on my local machine which was very slow as expected, but that is not my point. What I attempted to do was use linux scp to copy the image directories from my laptop into my GCP instance, but for the life of me I can’t figure out what I’m doing wrong. I tried to follow a number of guides on the web but to no avail.

After logging into my GCP server, I entered the following command:
$ scp -r HP-ENVY@[2600:1700:6470:15a0:c8b7:4e4d:ba18:348d]:"C:\Users\redex\OneDrive\Documents\Education\Fastai_Course_v3\nbs\dl1\data\autos" /home/jupyter/tutorials/fastai/course-v3/nbs/dl1/data

It returns the following error:
ssh: connect to host 2600:1700:6470:15a0:c8b7:4e4d:ba18:348d port 22: Network is unreachable

I’m using the -r option since I’m copying a directory, HP-ENVY is my laptop name, the IPv6 address is from my wifi connection which sits behind my home gateway. The path is in quotes as my laptop is Windows 10. I’m assuming that my laptop is the remote device relative to the server terminal, hence the IP address included.

I realize this forum thread is not necessarily for linux but I figured it’s at least related to GCP. Any linux experts who can advise me on this would be much appreciated.

Note: I eventually just replicated the image download process in GCP, but it sure would be good to know how to make scp work in case I need it in the future.

Brainkite · October 2, 2019, 8:53am

I’am having difficulties working on GCP with preemptible instance recently. I either cannot start the instance or get disconnected every 10 to 20 min.
I’ve been trying different regions (us-west, us-central, us-east, europe-west) but could not find a zone where I can work well.
What regions are you using?

mrajaram · October 2, 2019, 10:44pm

I started with us-west, and then duplicated my instance on us-central after us-west servers were down.

On central, I’ve been experiencing similar issues as you mentioned. It’s been annoying me so much I log whenever my instance can’t start or is remotely closed. Anecdotally, I’ve found that I will have 2-4 sessions of just a few minutes, and then a longer session that lasts a few hours.

ObSkewer · October 8, 2019, 11:24pm

Yeah, same here. Have only gotten my setup sorted today, but it’s been pretty frustrating, as constantly being kicked has been really hampering my ability to even complete the first notebook (generally get down to the fine-tuning section, then get kicked and have to re-run everything again).

Thinking about moving to Kaggle - at least while I get my feet wet and get a study routine in place.

Loving the course so far, and the notebooks are awesome - so no issue with the fastai side at all! Thanks

redtailedhawk · October 13, 2019, 1:31am

You have to start your instance from inside cloud.google.com. Go to the Compute > Instances tab. Click on start and make sure it’s running.

This error comes from you trying to ssh into an instance that isn’t turned on.

sambaths · October 13, 2019, 4:27am

The instance was on, but i fixed this.
In my case, I believe my Internet Provider was blocking this port, and hence I got this error.
When I tried the same using another connection, it worked.

amin_nejad · October 22, 2019, 1:44pm

Seems like it was updated and the tutorial now shows us-west1-b for a nvidia-tesla-p4 GPU. However I got the same error as you. According to the link you posted the P4 is no longer available in this location, so the tutorial needs to be updated again.

dries · October 24, 2019, 10:42am

I am getting this error on step 3, closely following of the GCP tutorial (https://course.fast.ai/start_gcp.html):

ERROR: (gcloud.compute.instances.create) Could not fetch resource:

The resource ‘projects/…/zones/us-west1-b/acceleratorTypes/nvidia-tesla-p4’ was not found

What happened?

4722794 · October 24, 2019, 9:07pm

@dries

You could try export ZONE=“us-west2-b” because that worked for me.

dries · October 25, 2019, 12:19pm

Thanks! That did the trick.

porkii · October 25, 2019, 3:31pm

This did the trick for me too. This should be added to the server setup tutorial.

warun · October 28, 2019, 11:33pm

Hi,

Instead of stopping my instance, I clicked on the delete option by mistake.

Is there any way of retrieving my instance back ?

If I cannot retrieve it and end up creating a new instance , will I be able to create the new instance with the cheap billing option that offers the $300 credit ?

Thanks.

enr · October 29, 2019, 10:56am

Hello,

I deployed a server with Ubuntu on GCP and with fastai at: https://console.cloud.google.com/marketplace/details/click-to-deploy-images/deeplearning

For some long trainings, I would like to make the jupyter notebook run also in case my browser or local computer are turned off.
I read that something like that it is possible to do it with screen or tmux but so far I did not figurate exactly how to do that.

Can somebody explain me how to do that with any appropriate tool?

bwarner · October 30, 2019, 1:32am

Deleting and recreating instances on GCP is fine. So you’ll need to recreate it, which will turn the machine on for the first time.

You will be charged for the first minute no matter what:

All vCPUs, GPUs, and GB of memory are charged a minimum of 1 minute . For example, if you run your virtual machine for 30 seconds, you will be billed for 1 minute of usage.

After 1 minute, instances are charged in 1 second increments .

warun · November 2, 2019, 2:52am

Thank you !

MiladDakka · November 9, 2019, 3:42am

Just a brief comment regarding a workaround I needed to get fastai working on WIndows.

Hello, new member of the course and forums here so please accept my apologies if I haven’t posted properly (couldn’t find a Post button so I am replying to the pinned post).

In short, Step 3 of the Tutorial on GCP setup suggests using an Ubuntu terminal, but I had a very annoying issue as my Windows (or Microsoft) Store would not install anything no matter how many things I tried (I was not about to reinstall Windows entirely).

After several gruelling days, my fix was to install Google Cloud SDK manually (after setting up a GCP billing account and project, of course) and within Cloud SDK to type “gcloud init” and then “gcloud compute ssh PROJECT-NAME – -L 8080:localhost:8080” and finally typing “http:localhost:8080” into my browser (where PROJECT-NAME is the name of your VM instance).

I really hope this might help anyone with this silly Windows Store issue (quite a common one apparently) trying to set up on GCP.

Very excited to deep learn!

Kind regards,
Milad

PyGeek03 · November 9, 2019, 12:19pm

Hi everyone,

I’m just wondering about the actual cost of the “Budget Compute” setup in the setup guide. According to the guide, “this setup will roughly double your training time”. Doesn’t this theoretically mean that to accomplish the same amount of work as the standard setup could in 1 hour (which costs $0.38), the budget setup would need 2 hour (which costs 2 x $0.23 = $0.46, or an increase of 8 cent!)?

Has anyone encountered this theoretical problem with the budget setup? How is your actual experience with it, ie. how slower on average is it?

Thanks a lot!