The logs window is empty.
The model file is 470 MB so I could see that leading to memory issues. It’s a super resolution model so you’ve got the whole Unet structure. What would be a way forward from here?
The logs window is empty.
The model file is 470 MB so I could see that leading to memory issues. It’s a super resolution model so you’ve got the whole Unet structure. What would be a way forward from here?
We’re working to surface OOMKilled logs in UI logs. In terms of what to do next: we’re going to introduce higher memory tiers soon, but you won’t be able to use the current model unless it’s somehow modified to take less than 1GB RAM. What happens when you run it locally? Can you monitor the memory to figure out how much it ends up taking?
@KarlH, for now, we’ve increased the memory limit for all Docker deploys to 1.5GB. We’ll eventually create a priced tier where you’d end up paying more than $5, but you don’t have to until we do that. Try deploying your service again?
https://enhance-u2e7.app.render.com/ is up after the limit increase.
Thanks for working on this Anurag, I appreciate it!
@anurag
I am using the new API with pickle file but end up getting following error;
File "/usr/local/lib/python3.6/site-packages/torch/serialization.py", line 78, in validate_cuda_device
raise RuntimeError('Attempting to deserialize object on a CUDA '
RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location='cpu' to map your storages to the CPU.
Jan 21 01:23:00 PM error building image: error building stage: waiting for process to exit: exit status 1
Jan 21 01:23:00 PM error: exit status 1
Here is how I modified your server.py file:
async def setup_learner():
await download_file(model_file_url, path/'models'/f'{model_file_name}.pkl')
defaults.device = torch.device('cpu')
learn = load_learner(path/'models')
return learn
Could you also update or create an alternate example file with load_learner() from fastai 1.40, please?
Thanks for Render!
Done. Thanks for letting me know!
FYI, uvicorn
changed their API so you might have to get my latest changes from origin. Specifically, this commit:
Due to the depreciation of the single_from_class
method of DataBunch, please note that :
learner.export()
rather than the .pth file given by learner.save()
load_learner
. Here’s what the setup_learner
should look like in server.py :async def setup_learner():
await download_file(model_file_url, path/'models'/f'{model_file_name}.pkl')
defaults.device = torch.device('cpu')
learn = load_learner(path/'models', f'{model_file_name}.pkl')
return learn
This is written mainly so that @anurag can update the github repo and the tutorial, but if others are still using the old version be careful it won’t work properly. It’s not throwing an error, but the model isn’t working as it should be.
The error you pointed out was due to how the export
and load_learner
methods were implemented in fastai. I’ve submitted a PR that fixes this, the error should go away when (and if) the PR is merged.
I’ve updated https://github.com/render-examples/fastai-v3 but it’ll need the PR to be merged to work.
Just saw that I forgot something to make it work once the PR is merged (to tell load_learner
to use the cpu).
The setup_learner function should be :
async def setup_learner():
await download_file(model_file_url, path/'models'/f'{model_file_name}.pkl')
learn = load_learner(path/'models', f'{model_file_name}.pkl', device='cpu')
return learn
You can also delete default.device = torch.device('cpu')
I think
We’ll need to create a new release of fastai to get this to work with the default notebook instructions cc @jeremy
Please note I have changed it to a flag so it should be
learn = load_learner(path/'models', f'{model_file_name}.pkl', cpu=True)
since it didn’t support any type of device (just cpu). There will be a release today.
I’m updating the repo at https://github.com/render-examples/fastai-v3 to make sure everything works as intended. Will post on this thread when it’s ready.
FYI, thanks to @sgugger making some quick and important updates to fastai, https://github.com/render-examples/fastai-v3 is working again. Everyone who’s forked the sample repo may want to update their forks. See instructions here: https://robots.thoughtbot.com/keeping-a-github-fork-updated
cc @PierreO
Great job from you both, thanks !
For anyone running into this error on deployment:
raise RuntimeError('Attempting to deserialize object on a CUDA '
RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location='cpu' to map your storages to the CPU.
Make sure to update the version of fastai to 1.0.42 (or the latest) in your notebook environment, restart the kernel and export the .pkl
file again.
Just FYI, I’ve made a few changes to the sample repo. Please update your forks accordingly.
@anurag, I am on fastai 1.0.42, and using all the latest files, including your latest server.py forked this morning.
sudo /opt/anaconda3/bin/conda list fastai
fastai 1.0.42 1 fastai
The export.pkl file that I generated is still failing with the same error listed above on deployment on Render. I have repeated the process with the same result. In my modified course-v3/lesson2-download Jupyter notebook, the following works just fine:
learn.export()
img = open_image(path/‘samoyed’/‘00000057.jpg’)
learn = load_learner(path)
pred_class,pred_idx,outputs = learn.predict(img)
pred_class
Am I overlooking something? It seems like I have done everything as required.
Thanks,
Jeff