Deployment Platform: Render ✅

Skemu · January 15, 2020, 3:32pm

Hi Mrfabulous1,

I do confirm I’ve tested the model - just re-tested now to make sure.
Yes - from what I see in the list below:

Libraries from requirements.txt:
aiofiles==0.4.0
aiohttp==3.5.4
asyncio==3.4.3
fastai==1.0.60
numpy==1.15.4
starlette==0.12.0
uvicorn==0.7.1
python-multipart==0.0.5
pillow==5.4.1
torch==1.0.0
torchvision==0.2.1

Libraries from !pip list:
asn1crypto 0.24.0
attrs 18.2.0
backcall 0.1.0
beautifulsoup4 4.7.1
bleach 3.1.0
Bottleneck 1.2.1
certifi 2018.11.29
cffi 1.11.5
chardet 3.0.4
cryptography 2.3.1
cycler 0.10.0
cymem 2.0.2
cytoolz 0.9.0.1
dataclasses 0.6
decorator 4.3.0
dill 0.2.8.2
entrypoints 0.3
fastai 1.0.60
fastprogress 0.2.2
idna 2.8
ipykernel 5.1.0
ipython 7.2.0
ipython-genutils 0.2.0
ipywidgets 7.4.2
jedi 0.13.2
Jinja2 2.10
jsonschema 3.0.0a3
jupyter 1.0.0
jupyter-client 5.2.4
jupyter-console 6.0.0
jupyter-core 4.4.0
kiwisolver 1.0.1
MarkupSafe 1.1.0
matplotlib 3.0.2
mistune 0.8.4
mkl-fft 1.0.10
mkl-random 1.0.2
msgpack 0.5.6
msgpack-numpy 0.4.3.2
murmurhash 1.0.0
nb-conda 2.2.1
nb-conda-kernels 2.2.0
nbconvert 5.3.1
nbformat 4.4.0
notebook 5.7.4
numexpr 2.6.9
numpy 1.15.4
nvidia-ml-py3 7.352.0
olefile 0.46
packaging 19.0
pandas 0.23.4
pandocfilters 1.4.2
parso 0.3.1
pexpect 4.6.0
pickleshare 0.7.5
Pillow 5.4.1
pip 18.1
plac 0.9.6
preshed 2.0.1
prometheus-client 0.5.0
prompt-toolkit 2.0.7
ptyprocess 0.6.0
pycparser 2.19
Pygments 2.3.1
pyOpenSSL 18.0.0
pyparsing 2.3.1
pyrsistent 0.14.9
PySocks 1.6.8
python-dateutil 2.7.5
pytz 2018.9
PyYAML 3.13
pyzmq 17.1.2
qtconsole 4.4.3
regex 2018.1.10
requests 2.21.0
scipy 1.2.0
Send2Trash 1.5.0
setuptools 40.6.3
six 1.12.0
soupsieve 1.7.1
spacy 2.0.18
terminado 0.8.1
testpath 0.4.2
thinc 6.12.1
toolz 0.9.0
torch 1.0.0
torchvision 0.2.1
tornado 5.1.1
tqdm 4.29.1
traitlets 4.3.2
typing 3.6.4
ujson 1.35
urllib3 1.24.1
wcwidth 0.1.7
webencodings 0.5.1
wheel 0.32.3
widgetsnbextension 3.4.2
wrapt 1.10.11

mrfabulous1 · January 15, 2020, 4:07pm

Hi Skemu hope your still having a jolly day!

Assuming the model is correct and requirements.txt is correct, we have no alternative but to look else were for the problem.

Can you paste your server.py file here please?
Can you confirm the link to the model is working correctly and collecting the correct model, normally you can put this link in a browser and it will start to download.

Cheers mrfabulous1

Skemu · January 15, 2020, 4:15pm

Thanks for looking into this. The jollyness is running out thin

The link to the model works fine - just tested it now

The server.py file:

import aiohttp
import asyncio
import uvicorn
from fastai import *
from fastai.vision import *
from io import BytesIO
from starlette.applications import Starlette
from starlette.middleware.cors import CORSMiddleware
from starlette.responses import HTMLResponse, JSONResponse
from starlette.staticfiles import StaticFiles

export_file_url = ‘https://drive.google.com/file/d/12jL-v6gtaSMRT_8S9GfQ-EeCuhtbmVIV/view?usp=sharing’
export_file_name = ‘export.pkl’

classes = [‘rosemary’, ‘thyme’, ‘sage’]
path = Path(file).parent

app = Starlette()
app.add_middleware(CORSMiddleware, allow_origins=[’*’], allow_headers=[‘X-Requested-With’, ‘Content-Type’])
app.mount(’/static’, StaticFiles(directory=‘app/static’))

async def download_file(url, dest):
if dest.exists(): return
async with aiohttp.ClientSession() as session:
async with session.get(url) as response:
data = await response.read()
with open(dest, ‘wb’) as f:
f.write(data)

async def setup_learner():
await download_file(export_file_url, path / export_file_name)
try:
learn = load_learner(path, export_file_name)
return learn
except RuntimeError as e:
if len(e.args) > 0 and ‘CPU-only machine’ in e.args[0]:
print(e)
message = “\n\nThis model was trained with an old version of fastai and will not work in a CPU environment.\n\nPlease update the fastai library in your training environment and export your model again.\n\nSee instructions for ‘Returning to work’ at https://course.fast.ai.”
raise RuntimeError(message)
else:
raise

loop = asyncio.get_event_loop()
tasks = [asyncio.ensure_future(setup_learner())]
learn = loop.run_until_complete(asyncio.gather(*tasks))[0]
loop.close()

@app.route(’/’)
async def homepage(request):
html_file = path / ‘view’ / ‘index.html’
return HTMLResponse(html_file.open().read())

@app.route(’/analyze’, methods=[‘POST’])
async def analyze(request):
img_data = await request.form()
img_bytes = await (img_data[‘file’].read())
img = open_image(BytesIO(img_bytes))
prediction = learn.predict(img)[0]
return JSONResponse({‘result’: str(prediction)})

if name == ‘main’:
if ‘serve’ in sys.argv:
uvicorn.run(app=app, host=‘0.0.0.0’, port=5000, log_level=“info”)

leoauri · January 16, 2020, 12:02pm

Hi, I can deploy the test app just fine. But if I try to deploy my model it runs out of memory and the app gets killed (“Server failed; Out of memory (used over 512MB)”).
I’m surprised since it’s the same resnet34 architecture, although it uses a lot more classes. Is there something I can do to reduce the memory usage of my model?

This is the repo: https://github.com/leoauri/audio-auto-tag/tree/master

mrfabulous1 · January 16, 2020, 1:29pm

Hi leoauri hope all is well.

I believe there is a 0.5Mb limit on the basic render account.
If your model file is large try testing your app with a smaller model this may work.

Cheers mrfabulous1

LessW2020 · January 23, 2020, 4:58am

I just tried to deploy for the first time with Render but the install failed on installing Pillow??

Jan 22 08:49:38 PM  INFO[0002] Downloading base image python:3.7-slim-stretch

Jan 22 08:49:46 PM INFO[0010] RUN apt-get update && apt-get install -y git python3-dev gcc && rm -rf /var/lib/apt/lists/*
Jan 22 08:50:02 PM INFO[0026] COPY requirements.txt .
Jan 22 08:50:02 PM INFO[0026] extractedFiles: [/requirements.txt /]
Jan 22 08:50:02 PM INFO[0027] RUN pip install --upgrade -r requirements.txt
Jan 22 08:50:56 PM INFO[0080] COPY app app/
Jan 22 08:50:56 PM INFO[0080] RUN python app/server.py
Jan 22 08:51:07 PM Traceback (most recent call last):
File “app/server.py”, line 5, in
from fastai.vision import *
File “/usr/local/lib/python3.7/site-packages/fastai/vision/init.py”, line 3, in
from .learner import *
File “/usr/local/lib/python3.7/site-packages/fastai/vision/learner.py”, line 6, in
from . import models
File “/usr/local/lib/python3.7/site-packages/fastai/vision/models/init.py”, line 2, in
from torchvision.models import ResNet,resnet18,resnet34,resnet50,resnet101,resnet152
File “/usr/local/lib/python3.7/site-packages/torchvision/init.py”, line 2, in
from torchvision import datasets
File “/usr/local/lib/python3.7/site-packages/torchvision/datasets/init.py”, line 9, in
from .fakedata import FakeData
File “/usr/local/lib/python3.7/site-packages/torchvision/datasets/fakedata.py”, line 3, in
from … import transforms
File “/usr/local/lib/python3.7/site-packages/torchvision/transforms/init.py”, line 1, in
from .transforms import *
File “/usr/local/lib/python3.7/site-packages/torchvision/transforms/transforms.py”, line 17, in
from . import functional as F
File “/usr/local/lib/python3.7/site-packages/torchvision/transforms/functional.py”, line 5, in
from PIL import Image, ImageOps, ImageEnhance, PILLOW_VERSION
ImportError: cannot import name ‘PILLOW_VERSION’ from ‘PIL’ (/usr/local/lib/python3.7/site-packages/PIL/init.py)
Jan 22 08:51:07 PM error building image: error building stage: failed to execute command: waiting for process to exit: exit status 1
Jan 22 08:51:07 PM error: exit status 1

Any advise here as I just used the standard setup with nothing special…

Thanks
Less

mrfabulous1 · January 23, 2020, 10:29am

Hi LessW2020 hope your having a jolly day.

If you search this thread for pip list you will find that the majority of people train their models on one platform then transfer it to render.com

however often the library versions are different between the training platform (I use Colab) and the versions in the requirements.txt of the deployed app created with the teddy bear repo.

So the first step to problem solving is to run pip list on your training platform and confirm the versions are the same in your deployed app requirements.txt. if not change requirements.txt in your deployed app.
Depending on the differences it can be none or all

Also if you search this thread you will notice people have been having issues with pillow and have added it to their requirements.txt.

Running fastai2 on colab I have also noted this error.

muellerzr · January 23, 2020, 11:39am

Which version of pillow are you installing @LessW2020. As @mrfabulous1 mentioned, there’s some bugs with the newest version, I’d double check you’re installing the same version (you can also do pip show pillow to get the version info)

LessW2020 · January 23, 2020, 5:48pm

Thanks for the tip on how to solve. I’ll take a look this afternoon.

LessW2020 · January 28, 2020, 4:15am

I ran pip install and updated my requirements.txt. After a bit of work, got the render image to install and … then back to the exact same error:

File “/usr/local/lib/python3.7/site-packages/torchvision/transforms/functional.py”, line 5, in
from PIL import Image, ImageOps, ImageEnhance, PILLOW_VERSION
ImportError: cannot import name ‘PILLOW_VERSION’ from ‘PIL’ (/usr/local/lib/python3.7/site-packages/PIL/init.py)
Jan 27 08:08:30 PM error building image: error building stage: failed to execute command: waiting for process to exit: exit status 1
Jan 27 08:08:30 PM error: exit status 1

I’m running
torch==1.2.0
torchvision==0.4.0

but the error seems to be this missing PILLOW_VERSION no matter what torch/torchvision combo at this point.

muellerzr · January 28, 2020, 4:20am

You need the correct pillow version. Don’t use the latest. Pillow needs 6.2.1 @LessW2020. The latest version has that bug

LessW2020 · January 28, 2020, 5:01am

Thanks @muellerzr that finally got me past this pillow issue. I’m close to loading but hit a pickle error now - anyone have any idea on this?

File “app/server.py”, line 48, in
learn = loop.run_until_complete(asyncio.gather(*tasks))[0]
File “/usr/local/lib/python3.7/asyncio/base_events.py”, line 583, in run_until_complete
return future.result()
File “app/server.py”, line 35, in setup_learner
learn = load_learner(path, export_file_name)
File “/usr/local/lib/python3.7/site-packages/fastai/basic_train.py”, line 618, in load_learner
state = torch.load(source, map_location=‘cpu’) if defaults.device == torch.device(‘cpu’) else torch.load(source)
File “/usr/local/lib/python3.7/site-packages/torch/serialization.py”, line 386, in load
return _load(f, map_location, pickle_module, **pickle_load_args)
File “/usr/local/lib/python3.7/site-packages/torch/serialization.py”, line 563, in _load
magic_number = pickle_module.load(f, **pickle_load_args)
_pickle.UnpicklingError: invalid load key, ‘<’.
Jan 27 08:48:01 PM error building image: error building stage: failed to execute command: waiting for process to exit: exit status 1
Jan 27 08:48:01 PM error: exit status 1

Thanks much,
Less

muellerzr · January 28, 2020, 5:06am

If I had to guess, first are we doing this in fastai1 or fastai2?

LessW2020 · January 28, 2020, 5:11am

ah, fastai 1.0.60
I pushed the weights out to a google drive shared link per the directions.
Anyway, I wanted to use render to make deploying fast and easy but it’s been the opposite so far lol.
Let me know if you have any ideas otherwise I think I’m going to just give up and look at AWS or similar at this point.

muellerzr · January 28, 2020, 5:12am

Hmmm and you’ve installed the same version you trained on? I’d recommend trying again (not in render) and see if you can load_learner the pkl file as a start

LessW2020 · January 28, 2020, 5:20am

I’m running on 1.0.61dev0 but that doesn’t have an option in terms of being a requirement.
Thus I used 1.0.60 as the closest.

ERROR: Could not find a version that satisfies the requirement fastai==1.0.61 (from -r requirements.txt (line 4)) (from versions: 0.6, 0.7.0, 1.0.0b7, 1.0.0b8, 1.0.0, 1.0.1, 1.0.2, 1.0.3, 1.0.4, 1.0.5, 1.0.6, 1.0.7, 1.0.9, 1.0.10, 1.0.11, 1.0.12, 1.0.13, 1.0.14, 1.0.15, 1.0.16, 1.0.17, 1.0.18, 1.0.19, 1.0.20, 1.0.21, 1.0.22, 1.0.24, 1.0.25, 1.0.26, 1.0.27, 1.0.28, 1.0.29, 1.0.30, 1.0.31, 1.0.32, 1.0.33, 1.0.34, 1.0.35, 1.0.36, 1.0.36.post1, 1.0.37, 1.0.38, 1.0.39, 1.0.40, 1.0.41, 1.0.42, 1.0.43.post1, 1.0.44, 1.0.46, 1.0.47, 1.0.47.post1, 1.0.48, 1.0.49, 1.0.50, 1.0.50.post1, 1.0.51, 1.0.52, 1.0.53, 1.0.53.post1, 1.0.53.post2, 1.0.53.post3, 1.0.54, 1.0.55, 1.0.57, 1.0.58, 1.0.59, 1.0.60)

Jan 27 09:14:13 PM ERROR: No matching distribution found for fastai==1.0.61 (from -r requirements.txt (line 4))

I would doubt that should matter. I did test export and loading the pkl file locally but not sure if this is some issue with it not puling the file successfully from google drive or what.

Anyway, feels like I’m flying blind on render as I can’t really check anything and everytime I make a change no matter how tiny (i.e. server.py) it downloads and redoes the entire docker process so makes trying to fix very slow…

Thanks for any help,
Less

muellerzr · January 28, 2020, 5:22am

Yes, I highly recommend building on your local machine before pushing it to the cloud. Makes it much faster to debug. (run server.py on your linux CLI). Did you ensure when you pulled locally you were on 1.0.60?

LessW2020 · January 28, 2020, 5:24am

I don’t have a linux machine locally lol. That’s why I use salamander.ai
Anyway, I think I’m giving up on this and will go read up on how to deploy on AWS.
Thanks for the help though!
Less

muellerzr · January 28, 2020, 5:25am

@LessW2020 I’m not sure if you’re on windows or not, but you can install a linux CLI. Lifesaver! I think you’ll run into the same issue there, so running may not be your best option I think it has to due with the file itself in your drive. You said it would load if you did a load_learner?

LessW2020 · January 28, 2020, 5:30am

correct - if I load and predict using the file on salamander it works as expected.
what I can’t tell is if render is able to pull the file successfully from google drive. When I try to check it manually I get a “no preview available” and download button but thats via browser so not sure if render gets that same screen or a wget or similar just pulls directly.
That’s what I meant about making any changes to server.py to put some debugging on it but then have to remake the entire instance…
I have a windows machine so I’ll check installing linux but the other thing I can do is put the pkl file in an S3 bucket and see if render can load from there.
Thanks for the help!