GCP gcloud ssh google app engine deployment shows final state 2 error messages

dummie · June 26, 2019, 8:44pm

Hi, I am trying to deploy the lesson 2 bear learned model on GCP Google APP engine by using the Gcloud ssh command, after I execute the command ‘gcloud app deploy’, it seems to me the code is fine, but at the last state like this “Updating service [default] (this may take several minutes)”, it seems like taking a long time at this state, and finally I got a 1st error message like this “ERROR: (gcloud.app.deploy) Error Response: [4] Timed out waiting for the app infrastructure to become healthy.”.

Sometimes it shows the 2st error message like this ‘ERROR: (gcloud.app.deploy) Error Response: [4] Your deployment has failed to become healthy in the allotted time and therefore was rolled back. If you believe this was an error, try adjusting the ‘app_start_timeout_sec’ setting in the ‘readiness_check’ section.’

Finally, the GAE shows me the 3st error message:
Kind of saying like this “operation timed out”.

Extra deployment details:

under gcp creating new project
using gcloud ssh command
using cpu instance, not gpu instance for deployment.
google app engine(GAE) deployment

I researched online a bit, people are talking about the possible ip address quota issue, but I have the enough ip address quota for now, it is not over the limit.

I am assuming 3 possible solutions:
1)update the health check code in app.yaml to prevent the build failing?

2)delete the app, recreate a app and choose a new regional zone.

3)I use export.pkl to successfully deployed on render.com, but I use stage.pth as a model file to deploy on Google App Engine, and it failed, so I am kind of confused with these 2 variable file name?

UPDATE: After spend couple days to test it out and finding solutions, I think I have solved the issue. Please learn from below.

How to deploy you APP on Google APP engine by using the Gcloud command line ?

Make sure you are following the exact GCP server setup guide to create a GPU instance for you machine learning model. Don’t forget to add the Preemptible instances, because it is a lot cheaper. https://course.fast.ai/start_gcp.html
Make sure you have the most updated Fastai version and library in your Jupyter notebook development environment. Learn more here https://course.fast.ai/update_gcp.html
Make sure you turn on your GPU or CPU instance when you try to deploy your first APP, if you change your GitHub code, you can use the Gcloud command ‘git pull repositoryURL’ to update your code.

If you are fowling the above exact step and when trying to deploy the model in Gcloud by using the command ‘Gcloud app deploy’, you may be facing the 3 major issues.

In server.py, Simply change the data as a string type: JSONResponse({‘result’:learn.predict(img)[0]} -> JSONResponse({‘result’:str(learn.predict(img)[0])}
In server.py, Simply change the code tfms=get_transforms() -> ds_tfms=get_transforms()
in app.yaml file, updated your development environment and machine type specs as the following code:

runtime: custom
env: flex
readiness_check:
app_start_timeout_sec: 1800
resources:
cpu: 2
memory_gb: 12
disk_size_gb: 100

The direct link is here https://github.com/webapp88/google-app-engine/blob/master/app.yaml, you can also check your own custom machine type specification by the attached image below.

My update Github repository link https://github.com/webapp88/google-app-engine
Working Render.com Web APP here https://bear-vision.onrender.com/

thank you!
best regards,