[SOLVED] Error when trying to deploy model on Google App Engine (and locally)

Hi,

I am trying to follow the Production guide on Deploying on Google App Engine but keep failing the Cloud Build. I then decided to test it locally by running python app/server.py serve and got the same error that I got from Cloud Build, as shown below:

(test-app) Jis-MacBook-Pro-Work:google-app-engine ji$ python app/server.py serve
Traceback (most recent call last):
  File "app/server.py", line 37, in <module>
    learn = loop.run_until_complete(asyncio.gather(*tasks))[0]
  File "/Users/ji/anaconda3/envs/test-app/lib/python3.7/asyncio/base_events.py", line 584, in run_until_complete
    return future.result()
  File "app/server.py", line 30, in setup_learner
    tfms=get_transforms(), size=224).normalize(imagenet_stats)
  File "/Users/ji/anaconda3/envs/test-app/lib/python3.7/site-packages/fastai/vision/data.py", line 165, in single_from_classes
    return sd.label_const(0, label_cls=CategoryList, classes=classes).transform(ds_tfms, **kwargs).databunch()
TypeError: transform() got multiple values for argument 'tfms'

Has anyone deployed / tested the app locally and encountered the same error that I did? Thanks.

Ji

2 Likes

Even I faced the same issue. The starter code for Google App Engine uses a deprecated method single_from_classes. Instead use load_learner and it should work. Please check out my answer in this post Gcloud app engine error

2 Likes

Thank you very much for the useful info. I am now able to get rid of the transform related problem by referring to your post. However, now I got another error, AttributeError, complaining about an non-existant attribute ImageItemList. It appears to me as a code version problem of my fastai package, but I am already using the latest version 1.0.46. Have you encounter such problem in your case?

/Users/ji/anaconda3/envs/test-app/lib/python3.7/site-packages/torch/serialization.py:435: SourceChangeWarning: source code of class 'torchvision.models.resnet.BasicBlock' has changed. you can retrieve the original source code by accessing the object's source attribute or set `torch.nn.Module.dump_patches = True` and use the patch tool to revert the changes.
  warnings.warn(msg, SourceChangeWarning)
/Users/ji/anaconda3/envs/test-app/lib/python3.7/site-packages/torch/serialization.py:435: SourceChangeWarning: source code of class 'fastai.layers.Flatten' has changed. you can retrieve the original source code by accessing the object's source attribute or set `torch.nn.Module.dump_patches = True` and use the patch tool to revert the changes.
  warnings.warn(msg, SourceChangeWarning)
Traceback (most recent call last):
  File "app/server.py", line 42, in <module>
    learn = loop.run_until_complete(asyncio.gather(*tasks))[0]
  File "/Users/ji/anaconda3/envs/test-app/lib/python3.7/asyncio/base_events.py", line 584, in run_until_complete
    return future.result()
  File "app/server.py", line 30, in setup_learner
    learn = load_learner(path / 'models')
  File "/Users/ji/anaconda3/envs/test-app/lib/python3.7/site-packages/fastai/basic_train.py", line 549, in load_learner
    state = torch.load(Path(path)/fname, map_location='cpu') if defaults.device == torch.device('cpu') else torch.load(Path(path)/fname)
  File "/Users/ji/anaconda3/envs/test-app/lib/python3.7/site-packages/torch/serialization.py", line 368, in load
    return _load(f, map_location, pickle_module)
  File "/Users/ji/anaconda3/envs/test-app/lib/python3.7/site-packages/torch/serialization.py", line 542, in _load
    result = unpickler.load()
AttributeError: Can't get attribute 'ImageItemList' on <module 'fastai.vision.data' from '/Users/ji/anaconda3/envs/test-app/lib/python3.7/site-packages/fastai/vision/data.py'>

[Solved]
It turns out that the reason that I am getting the error AttributeError: Can't get attribute 'ImageItemList' is because I was using a different version of fastai to generate export.pkl (fastai 1.0.42) from the one used in production (fastai 1.0.46). Thanks to the explanation here. After I update my development env to use fastai 1.0.46 and retrained the model, the error is no more.

Other complications that I have overcome to deploy the app on Google App Engine:

server.py:

  1. setup_learner() def is updated with function load_learner() instead of the deprecated factory method ImageDataBunch.single_from_classes(), many thanks to the answer from @jinudaniel.
  2. In analyze() def, the first item of the learn.predict(img) tuple became a Category object (in fastai v1.0.46) instead of the class name string (in fastai v1.0.42). So to access the string you will need to use the .obj attribute.

app.yaml:

  1. the gcloud app deploy keeps timing out due to long build and deploy time, so I added these following lines to app.yaml
readiness_check:
  app_start_timeout_sec: 600

Hopefully these info will help others who encounter difficulties similar to mine.

3 Likes

I had the exact problem as yours, and your notes allowed me to successfully deploy my web app.

@subasa Glad that it helps.

@xujiboy When deploying on Google App Engine, are you having any issues installing the fastai module? I think I am running out of memory when installing, so can’t proceed any further. Were you able to do this on the standard GAE?

@arajendran Hi, I followed the exact instruction in the fastai deployment tutorial and have not experienced any memory issue during deployment. However, I am using the Flexible environment (which is also the default setting in the app.yaml), and if you are using Standard environment there could be issues (I am just speculating here) since Google has a policy / preference on when to choose which. This app is deployed through Docker Container so Flexible environment is what they recommend to use. See this page for more information.

@xujiboy Ok thanks, I had a feeling it was because I am running in Standard environment, I will try migrating to Flexible and give it a shot. Thanks for the input.

Hi @xujiboy, I had the same problems and your hints helped. Thank you very much.

However, I’m experiencing some strange behaviour. I download my exported model file to the models subfolder, but once I call load_learner(app_path/'models', model_file_name) there is another empty subfolder models inside the original models folder, i.e. I end up with

app/models/models.md
app/models/BuildingsModel.pkl
app/models/models/

Any idea why that is? My code is as follows:

model_file_name = 'BuildingsModel.pkl'
classes = ['cabin', 'castle', 'house']
app_path = Path(__file__).parent

# [...]

async def setup_learner():
    await download_file(model_file_url, app_path/'models'/model_file_name)
    # empty subfolder 'models/models/' not present yet
    learner = load_learner(app_path/'models', model_file_name)
    # empty subfolder 'models/models/' present now
    return learner

If I do not call load_learner() then the empty subfolder is not created.

I also get

/home/schorsch/.local/lib/python3.7/site-packages/torch/serialization.py:454: SourceChangeWarning: source code of class 'torchvision.models.resnet.BasicBlock' has changed. you can retrieve the original source code by accessing the object's source attribute or set `torch.nn.Module.dump_patches = True` and use the patch tool to revert the changes.
  warnings.warn(msg, SourceChangeWarning)

during runtime.

A pip list on my local machine and a conda list on my gcp training machine showed that locally I have torchvision 0.3.0 but on my remote gcp training machine I have 0.2.2.

How can I safely update torchvision without breaking the whole PyTorch/CUDA/fastai-toolchain of my training machine. I am new to this. Should I actually worry at all or just ignore the SourcheChangeWarning above? The app seems to work, but resnet.BasicBlock sounds kind of alarming. :sweat_smile:

Thanks a lot in advance for all your help.

Does anybody have any advice on this? Should I just ignore the warning? It still sounds alarming. Is there a possibility that my model is not unpickled correctly or would that result in an actual error?

I have successfully deploy fastai code by following instruction from “Deploying on Google App Engine” , but ast last steps, I get error “502 Bad Gateway” from “nginx” . Does anyone know why? Thanks first!

I am getting the same error, have you solved it?

1 Like

I’m running into the same issue. Did you find a fix?

yes, in your app.yaml file add this:

   readiness_check:
         app_start_timeout_sec: 1800

should work after that.

Unfortunately, I’ve run into another issue. I re-ran my app deploy and it seemed to run find because it ended with “latest: digest: sha256:20208697ede6c8df090c13c2e00d9a43cfe4be833d07602419a786c419a1b71b size: 2635
DONE”

But then it gave me this error “Updating service [default] (this may take several minutes)…failed.
ERROR: (gcloud.app.deploy) Error Response: [9]
Application startup error! Code: APP_CONTAINER_CRASHED”

Any ideas? I went to my app instance to restart it and it seems to be up and running, but I’m still running into 502 server error.

My app.yaml file looks like this.
runtime: custom
env: flex
readiness_check:
app_start_timeout_sec: 1800

Did I add the readiness_check to the right area?

Hi, I’m experiencing the same issue.

I checked using the python -m fastai.utils.show_install command and I have the same fastai version both in Google Colab and my local environment. This is the error I’m getting:

Unexpected key(s) in state_dict: "opt_func", "loss_func", "metrics", "true_wd", "bn_wd", "wd", "train_bn", "model_dir", "callback_fns", "cb_state", "model", "data", "cls". 

Any idea why this may be happening?

Thanks.