Platform: GCP ✅

subasa · March 13, 2019, 10:57am

Try localhost:8080/tree. It worked for me when I used the notebook and did not work when I used the lab (localhost:8080/lab)

sonicviz · March 13, 2019, 12:17pm

Are you replying to me or ?

I’m running it from the cloud, there is no localhost.

sonicviz · March 14, 2019, 5:37am

I was just testing out the Google App Engine stock deployment from https://course.fast.ai/deployment_google_app_engine.html before doing my own model but it fails:

Step 7/9 : RUN python app/server.py
—> Running in d331a27e7048
Traceback (most recent call last):
File “app/server.py”, line 38, in
learn = loop.run_until_complete(asyncio.gather(*tasks))[0]
File “/usr/local/lib/python3.6/asyncio/base_events.py”, line 484, in run_until_complete
return future.result()
File “app/server.py”, line 31, in setup_learner
tfms=get_transforms(), size=224).normalize(imagenet_stats)
File “/usr/local/lib/python3.6/site-packages/fastai/vision/data.py”, line 165, in single_from_classes
return sd.label_const(0, label_cls=CategoryList, classes=classes).transform(ds_tfms, **kwargs).databunch()
TypeError: transform() got multiple values for argument ‘tfms’
The command ‘/bin/sh -c python app/server.py’ returned a non-zero code: 1
ERROR
ERROR: build step 0 “gcr.io/cloud-builders/docker” failed: exit status 1

I actually did it with my own model first and it failed, so reverted to stock to see if that worked and it doesn’t.
Any clues? It’s instructive actually, as it would be really useful to also know the best practices for debugging issues like this as they can be a real pita.

I get the same error trying to build an image using local Docker for Windows as well.

UPDATE:
Fixed Actually it worked after changing tfms to ds_tfms via Transform() got multiple values for argument 'tfms'-Lesson 2

ashyibo · March 17, 2019, 1:26pm

hi, did you find the solution for this? i been having this issues for a while now. Do you mind to share the solution here ?
The port 8080 is already in use, trying another port.

ashyibo · March 17, 2019, 7:50pm

Can anyone tell me how to fix this issues (the port 8080 is already in use). I been trying this for a while and still I can manage to crack on. If anyone have this same issue or anyone can solve this please let me know. Just let you know I am trying this on my mac.

angelinayy · March 22, 2019, 10:32pm

i restarted the machine. seemed to work.

sahilk1610 · March 24, 2019, 7:55am

Hello Everybody,

Lately i noticed that my normal model is taking too long to run - for example a classification problem which use to take 1 minute for an epoch is taking 16 - 17 minutes now. What can be the issue here ?

I am using (n1-highmem-8 (8 vCPUs, 52 GB memory)) and a 1 x NVIDIA Tesla K80 on GCP.

Would appreciate if somebody could help here.

Regards,
Sahil

shaialon · March 25, 2019, 3:51pm

Hi all,
So after playing around with Salamander a bit, I got around to installing Google Cloud environment which is much more mature (IMHO).

I would like to speed things up while avoiding errors like:
“RuntimeError: CUDA out of memory. Tried to allocate 8.00 MiB (GPU 0; 7.43 GiB total capacity; 6.92 GiB already allocated; 4.94 MiB free; 19.30 MiB cached)”.
Since the K80, V100, T4, notations of the GPUs don’t mean much to me, I would appreciate it if you can recommend a very fast setup that is proven to complete the processing rapidly. I don’t mind paying $5-10 / hour (or more if needed), given that processing time improves dramatically over the suggested setup in the docs.

Further Questions:

What would generally work better: horizontal scaling (more GPUs), or vertical (stronger GPU models)?
It seems that the default “n1-highmem-8” provides 8 CPU cores with 52GB RAM memory? But how does that make sense with an 8GB GPU? My intuition is that I would need a far higher GPU capacity to leverage the “n1-highmem-8”, since most of the pressure is on the GPU. What am I missing ?
What are the effects of upgrading CPUs / Storage?

Thanks !

EDIT:
For now I’m using “n1-highmem-8” with nvidia-tesla-p100,count=1
Seems to be fast enough… I have no idea how much the bill will be, and google cloud makes it really hard to find out.

amardeep · March 26, 2019, 4:46am

You should be able to estimate pricing using the pricing calculator:

Make sure you shutdown the instance when not in use and select preemptible instance to reduce cost.

shaialon · March 26, 2019, 8:54am

Thanks @amardeep , Exactly what I was looking for

deep-learner · March 27, 2019, 5:12pm

Any idea why I’m getting this error:

jupyter@fast-ai-instance:/home$ mkdir test01
mkdir: cannot create directory ‘test01’: Permission denied

aloy · March 27, 2019, 5:58pm

@deep-learner, that’s expected behavior. The directories right under /home are the “homes”, or starting directories, of server accounts and usually match the names of the accounts. We should not create directories at that level. The best place to create directories is under your own home directory, for example:

jupyter@my-fastai-instance:/home$ cd ~
jupyter@my-fastai-instance:~$ pwd
/home/jupyter
jupyter@my-fastai-instance:~$ mkdir test01
jupyter@my-fastai-instance:~$

deep-learner · March 27, 2019, 6:42pm

@aloy Thanks! So “/home/jupyter” is my home directory? Does this mean that “jupyter” is basically my username? I think that’s what was confusing me.

phren0logy · March 27, 2019, 7:31pm

Yes, connecting via ssh, the part before the @ is your username. So you connected (in a roundabout way) over ssh with the username jupyter. The image was preconfigured with that username, and but that’s the name the instance knows to identify the user.

deep-learner · March 31, 2019, 11:02pm

I’m suddenly facing this issue where my browser is unable to load “http://localhost:8080/tree” – the wheel keeps spinning and the browser status bar says “waiting for localhost.” Everything was working fine for a few days, but I just started facing this issue today. I tried stopping and starting the instance a few times, but it doesn’t help. Any idea what may be going on? Thanks in advance!

deep-learner · March 31, 2019, 11:40pm

Also, can anyone help me understand what this line of code is doing, and how it’s launching a Jupyter notebook?

gcloud compute ssh --zone=$ZONE jupyter@$INSTANCE_NAME – -L 8080:localhost:8080

Specifically, what do the ports 8080 and 8080 refer to? Thanks!

phren0logy · April 1, 2019, 5:19pm

It means “let me interact with the remote computer’s port 8080 on the local port 8080”

The standard (insecure) web port (sometimes called a “socket”) is 80, so you’ll often find things that run locally though a web interface using port 8080 or 8888 just because it’s easy to remember and type. Really, though, it could be just about any number, and port numbers are just conventions rather than firm rules.

Hope that helps.

SaltyFish · April 4, 2019, 3:43am

gaoyibin · April 7, 2019, 9:22am

Hi,when I run jupyter notebook list command in my server ,there is no running server .why?

deep-learner · April 8, 2019, 10:17pm

Can anybody explain what Google Cloud ML Notebooks is? It sounds like something half-way between Colab and GCP VM, but I’m not able to figure it out. What would be the advantages and disadvantages of me switching from GCP VM to this? Thanks!