Productionizing models thread

I have a problem with productionizing a segmentation model on Google App Engine.

To begin with, I followed the official tutorial and succesfully deployed Jeremy’s bear classification model. I also managed to run a different categorical model using ImageDataBunch.load_empty instead of ImageDataBunch.single_from_classes. (That would be teenage mutant ninja turtle recognition. :sweat_smile:)

Now, on to the segmentation example. I want the app to display a segmented picture. I modified the crucial part of server.py as follows:

   @app.route('/analyze', methods=['POST'])
   async def analyze(request):
       [...]
       return JSONResponse({'result':img.show(y=learn.predict(img)[0], figsize=(8,8))})

The app would always return “Result = null” (I run the app server locally). Trying to get to the bottom of the issue, I replaced the last line of the above snippet with

return JSONResponse({'result':learn.predict(img)[0]})

and learned that “Object of type ‘ImageSegment’ is not JSON serializable”. Here’s the full error message:

ERROR: Exception in ASGI application
Traceback (most recent call last):
  File "/home/zbigniew/.local/lib/python3.6/site-packages/uvicorn/protocols/http/httptools_impl.py", line 368, in run_asgi
    result = await app(self.scope, self.receive, self.send)
  File "/home/zbigniew/.local/lib/python3.6/site-packages/uvicorn/middleware/asgi2.py", line 7, in __call__
    await instance(receive, send)
  File "/home/zbigniew/.local/lib/python3.6/site-packages/starlette/middleware/errors.py", line 125, in asgi
    raise exc from None
  File "/home/zbigniew/.local/lib/python3.6/site-packages/starlette/middleware/errors.py", line 103, in asgi
    await asgi(receive, _send)
  File "/home/zbigniew/.local/lib/python3.6/site-packages/starlette/middleware/cors.py", line 138, in simple_response
    await inner(receive, send)
  File "/home/zbigniew/.local/lib/python3.6/site-packages/starlette/exceptions.py", line 74, in app
    raise exc from None
  File "/home/zbigniew/.local/lib/python3.6/site-packages/starlette/exceptions.py", line 63, in app
    await instance(receive, sender)
  File "/home/zbigniew/.local/lib/python3.6/site-packages/starlette/routing.py", line 41, in awaitable
    response = await func(request)
  File "app/server.py", line 48, in analyze
    return JSONResponse({'result':learn.predict(img)[0]})
  File "/home/zbigniew/.local/lib/python3.6/site-packages/starlette/responses.py", line 43, in __init__
    self.body = self.render(content)
  File "/home/zbigniew/.local/lib/python3.6/site-packages/starlette/responses.py", line 150, in render
    separators=(",", ":"),
  File "/usr/lib/python3.6/json/__init__.py", line 238, in dumps
    **kw).encode(obj)
  File "/usr/lib/python3.6/json/encoder.py", line 199, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "/usr/lib/python3.6/json/encoder.py", line 257, in iterencode
    return _iterencode(o, 0)
  File "/usr/lib/python3.6/json/encoder.py", line 180, in default
    o.__class__.__name__)
TypeError: Object of type 'ImageSegment' is not JSON serializable

I guess I should replace JSONResponse with something else, but the problem is I know next to nothing about JavaScript and find its code hard to read. I googled a bit and it seems that a possible workaround is to convert the image to base64 string, but I don’t quite know how to do it in this particular setting. Perhaps there is a simpler solution and someone could help me out?

Would you be able to give any more details on the packages you had in your 107mb zip file? I am currently trying to do similar, but was thinking of creating a layer just for fastai as a separate layer that could sit on top of the existing pytorch layer mentioned in the AWS Lambda example https://course.fast.ai/deployment_aws_lambda.html

It took a while, but I was finally able to deploy my model (an image classifier of penguins) to AWS Lambda, using AWS Layers.

The issue, as many people have found, was the size of the dependencies. I decided to use the PyTorch layer mentioned in the AWS Lambda example on the course page, and then I created a second layer with fastai and other dependencies.

I did a full write-up in github on how I made this layer, see here. Essentially, by trial and error, I removed packages that were not needed for my function to run. It was slightly hacky as it involved commenting out an import line in one of the .py files in my packages directory. So I can’t guarantee that this would work for all types of models. But it should work for straightforward image classification (i.e. pets from lesson 1 and bears from lesson 2).

The code is in my github repository:
server side code
client side code

and the live website is here: https://www.patagoniapenguins.org/which-penguin/

2 Likes

Hi, is there a way to force predict() to use GPU? I understand in most scenarios predicting on CPU is preferred, but I am working with a pretrained model exported to an NVIDIA Jetson Nano where the ARM CPU is much less powerful than the GPU.

Right now when I call predict() my CPU usage goes to 100% and most of my memory / swap gets used. If I can force inference to happen on GPU then the rest of my application won’t get slowed down.

Thanks!

Bumping thread for help on this. Apologies in advance if that’s frowned upon.

The proper “method” would be to make a databunch out of a test set that you want to inference quickly and load it in too. Then when you export the model you need to specify keeping the GPU version not the CPU

1 Like

Hi @RobHammann

Since the current fastai is wraper on top of pytorch. What you could do is first check is GPU available and then send data/model to that.

import torch

# use 'cuda'/GPU if available
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
input = data.to(device)
model = MyModule(...).to(device)

I hope this helps a bit. You can find more docs on https://pytorch.org/docs/stable/tensor_attributes.html?highlight=device#torch.torch.device

Cheers

Bo

2 Likes

Thanks for the quick response @muellerzr , are you suggesting I load the images that I want to perform inference on into a databunch when I load the model? I plan on using this model on streaming images from a camera that wouldn’t be available when the model is loaded. If I were to load the model with one image each time I want to predict, load_learner() takes ~14 seconds, which is slow.

How do I do this? I’ve been using the learner export() and I don’t see any options for specifying GPU.

Thank you for quick reply also. If I understand correctly, I need a native pytorch module to use .to(). How do I get from my fastai trained resnet18 learner to that pytorch model that can be sent to GPU?

Hi @RobHammann, good question!

Since fastai is a wrapper around pytorch. One thing we can do is look into the basic_train.py and see what Learner class has.
At line 166 on the basic_trainer.py, we can find the model definition.

        self.model = self.model.to(self.data.device)  

So, with this new knowledge in mind, I think there is two paths we can take.

One is at Setup learner and data time. we can update the self.data.device to whatever we want. I think this way is good for training and maybe also for serving(productionlize) as well

Another way to do is at the serving time, you manually set your device to the one you want via the functions I mentioned above.

I think for your usecase, second way might be better. Here is a basic sample

from fastai import load_learner

my_learner =load_learner(PATH_TO_MY_SAVED_MODEL)
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
my_learner.model.to(device)

Let me know how it goes. I am not an expert in this, so let’s learn together.

Cheers

Bo

I solved my problem, even though I’m not entirely sure why. After thinking about the interesting facts from @yubozhao I tried to force my model onto CPU to see if I had consistent results. After a couple (very slow) tests on CPU I switched all instances of device back to cuda and it started working.

device = torch.device(“cuda:0” if torch.cuda.is_available() else “cpu”)
defaults.device = torch.device(“cuda”)

Previously I had used print(defaults.device) and it always printed “cuda” but I didn’t explicitly set it until I added that second line. Not sure if that fixed it. Thanks to all who helped!

Hi

I followed your approach to create the layer file, when I tried to build the docker image, it stopped at pip install fastai, it could not install bottleneck, did you run into this issue?

Thanks
Dong

Hi Dong

I just tried re-running my docker file, and yes I am now getting an error with the bottleneck dependency. This wasn’t happening before, so something must have changed in the dependencies.

Update:

It seems that the bottleneck package fails to install with the latest version of numpy: 1.17.0 (released 26-Jul-19). I think this can be worked around by installing the previous version of numpy 1.16.4 and bottleneck in the docker image before installing pytorch and fastai.

I’ve updated the notes in my github repo:

Hopefully that will resolve the issue…

Thanks
Laura

1 Like

Hi Laura

I fixed it in a different way. I think the root cause is bottleneck can’t find the numpy install, because numpy is installed through --target option to a different folder, and that folder is not on the pythonpath. I added the following line before install fastai to update pythonpath, so bottoleneck can find numpy, it fixed the issue.

ENV PYTHONPATH “${PYTONPATH}:/home/fastai/install/python/lib/python3.6/site-packages”

Dong

OK, I will add that to my notes. I’m glad you fixed the issue!

Is anyone using the official PyTorch Dockerfile here for deployment?

If so, I’m curious to hear about your experience. If not, I’m interested to know what Docker people folks are using and where they are deploying it too.

I’m especially interested to know how it works in local development vs. when the image is deployed to production (AWS for example) … like, if I build the image on my Macbook Pro which doesn’t have a GPU and then deploy to a server that does, will CUDA/CUDNN be available to use in production?

1 Like

For our production, we’re using the pytorch Docker image available in SageMaker. It comes with fastai preinstalled, but we had to upgrade/hardcode it to version 1.0.55 for our use case.

1 Like

Thanks for the 411.

How are you handling things at inference? For myself, I’m looking to have something that periodically pings an administrative application for textual data to run a number of NLP models against. Ideally, I’d like to handle that in batch form (vs. doing 1-by-1 predictions).

Any thoughts.

-wg

Our production model is hosted as an API to process 1-by-1 predictions on our website.

In another pipeline, we do have reports that requires the prediction from the above model. These are run periodically (ie batch), but still leveraging the same API. It works for us since we want less things to maintain over time.

Not sure whether this is the kind of reply you’re looking for tbh.

How do you handle “continuous learning” in production ?. Calculation of model drift , retraining of model. Is there any reference architecture that can be suggested .

thanks
Hari

1 Like