I am trying to follow the Production guide on Deploying on Google App Engine but keep failing the Cloud Build. I then decided to test it locally by running python app/server.py serve and got the same error that I got from Cloud Build, as shown below:
(test-app) Jis-MacBook-Pro-Work:google-app-engine ji$ python app/server.py serve
Traceback (most recent call last):
File "app/server.py", line 37, in <module>
learn = loop.run_until_complete(asyncio.gather(*tasks))[0]
File "/Users/ji/anaconda3/envs/test-app/lib/python3.7/asyncio/base_events.py", line 584, in run_until_complete
return future.result()
File "app/server.py", line 30, in setup_learner
tfms=get_transforms(), size=224).normalize(imagenet_stats)
File "/Users/ji/anaconda3/envs/test-app/lib/python3.7/site-packages/fastai/vision/data.py", line 165, in single_from_classes
return sd.label_const(0, label_cls=CategoryList, classes=classes).transform(ds_tfms, **kwargs).databunch()
TypeError: transform() got multiple values for argument 'tfms'
Has anyone deployed / tested the app locally and encountered the same error that I did? Thanks.
Even I faced the same issue. The starter code for Google App Engine uses a deprecated method single_from_classes. Instead use load_learner and it should work. Please check out my answer in this post Gcloud app engine error
Thank you very much for the useful info. I am now able to get rid of the transform related problem by referring to your post. However, now I got another error, AttributeError, complaining about an non-existant attribute ImageItemList. It appears to me as a code version problem of my fastai package, but I am already using the latest version 1.0.46. Have you encounter such problem in your case?
/Users/ji/anaconda3/envs/test-app/lib/python3.7/site-packages/torch/serialization.py:435: SourceChangeWarning: source code of class 'torchvision.models.resnet.BasicBlock' has changed. you can retrieve the original source code by accessing the object's source attribute or set `torch.nn.Module.dump_patches = True` and use the patch tool to revert the changes.
warnings.warn(msg, SourceChangeWarning)
/Users/ji/anaconda3/envs/test-app/lib/python3.7/site-packages/torch/serialization.py:435: SourceChangeWarning: source code of class 'fastai.layers.Flatten' has changed. you can retrieve the original source code by accessing the object's source attribute or set `torch.nn.Module.dump_patches = True` and use the patch tool to revert the changes.
warnings.warn(msg, SourceChangeWarning)
Traceback (most recent call last):
File "app/server.py", line 42, in <module>
learn = loop.run_until_complete(asyncio.gather(*tasks))[0]
File "/Users/ji/anaconda3/envs/test-app/lib/python3.7/asyncio/base_events.py", line 584, in run_until_complete
return future.result()
File "app/server.py", line 30, in setup_learner
learn = load_learner(path / 'models')
File "/Users/ji/anaconda3/envs/test-app/lib/python3.7/site-packages/fastai/basic_train.py", line 549, in load_learner
state = torch.load(Path(path)/fname, map_location='cpu') if defaults.device == torch.device('cpu') else torch.load(Path(path)/fname)
File "/Users/ji/anaconda3/envs/test-app/lib/python3.7/site-packages/torch/serialization.py", line 368, in load
return _load(f, map_location, pickle_module)
File "/Users/ji/anaconda3/envs/test-app/lib/python3.7/site-packages/torch/serialization.py", line 542, in _load
result = unpickler.load()
AttributeError: Can't get attribute 'ImageItemList' on <module 'fastai.vision.data' from '/Users/ji/anaconda3/envs/test-app/lib/python3.7/site-packages/fastai/vision/data.py'>
[Solved]
It turns out that the reason that I am getting the error AttributeError: Can't get attribute 'ImageItemList' is because I was using a different version of fastai to generate export.pkl (fastai 1.0.42) from the one used in production (fastai 1.0.46). Thanks to the explanation here. After I update my development env to use fastai 1.0.46 and retrained the model, the error is no more.
Other complications that I have overcome to deploy the app on Google App Engine:
server.py:
setup_learner() def is updated with function load_learner() instead of the deprecated factory method ImageDataBunch.single_from_classes(), many thanks to the answer from @jinudaniel.
In analyze() def, the first item of the learn.predict(img) tuple became a Category object (in fastai v1.0.46) instead of the class name string (in fastai v1.0.42). So to access the string you will need to use the .obj attribute.
app.yaml:
the gcloud app deploy keeps timing out due to long build and deploy time, so I added these following lines to app.yaml
readiness_check:
app_start_timeout_sec: 600
Hopefully these info will help others who encounter difficulties similar to mine.
@xujiboy When deploying on Google App Engine, are you having any issues installing the fastai module? I think I am running out of memory when installing, so can’t proceed any further. Were you able to do this on the standard GAE?
@arajendran Hi, I followed the exact instruction in the fastai deployment tutorial and have not experienced any memory issue during deployment. However, I am using the Flexible environment (which is also the default setting in the app.yaml), and if you are using Standard environment there could be issues (I am just speculating here) since Google has a policy / preference on when to choose which. This app is deployed through Docker Container so Flexible environment is what they recommend to use. See this page for more information.
@xujiboy Ok thanks, I had a feeling it was because I am running in Standard environment, I will try migrating to Flexible and give it a shot. Thanks for the input.
Hi @xujiboy, I had the same problems and your hints helped. Thank you very much.
However, I’m experiencing some strange behaviour. I download my exported model file to the models subfolder, but once I call load_learner(app_path/'models', model_file_name) there is another empty subfolder models inside the original models folder, i.e. I end up with
If I do not call load_learner() then the empty subfolder is not created.
I also get
/home/schorsch/.local/lib/python3.7/site-packages/torch/serialization.py:454: SourceChangeWarning: source code of class 'torchvision.models.resnet.BasicBlock' has changed. you can retrieve the original source code by accessing the object's source attribute or set `torch.nn.Module.dump_patches = True` and use the patch tool to revert the changes.
warnings.warn(msg, SourceChangeWarning)
during runtime.
A pip list on my local machine and a conda list on my gcp training machine showed that locally I have torchvision 0.3.0 but on my remote gcp training machine I have 0.2.2.
How can I safely update torchvision without breaking the whole PyTorch/CUDA/fastai-toolchain of my training machine. I am new to this. Should I actually worry at all or just ignore the SourcheChangeWarning above? The app seems to work, but resnet.BasicBlock sounds kind of alarming.
Does anybody have any advice on this? Should I just ignore the warning? It still sounds alarming. Is there a possibility that my model is not unpickled correctly or would that result in an actual error?
I have successfully deploy fastai code by following instruction from “Deploying on Google App Engine” , but ast last steps, I get error “502 Bad Gateway” from “nginx” . Does anyone know why? Thanks first!
Unfortunately, I’ve run into another issue. I re-ran my app deploy and it seemed to run find because it ended with “latest: digest: sha256:20208697ede6c8df090c13c2e00d9a43cfe4be833d07602419a786c419a1b71b size: 2635
DONE”
But then it gave me this error “Updating service [default] (this may take several minutes)…failed.
ERROR: (gcloud.app.deploy) Error Response: [9]
Application startup error! Code: APP_CONTAINER_CRASHED”
Any ideas? I went to my app instance to restart it and it seems to be up and running, but I’m still running into 502 server error.
My app.yaml file looks like this.
runtime: custom
env: flex
readiness_check:
app_start_timeout_sec: 1800
I checked using the python -m fastai.utils.show_install command and I have the same fastai version both in Google Colab and my local environment. This is the error I’m getting: