A bit of a late comment, but this is how we use ULMFiT in production: https://github.com/inspirehep/inspire-classifier
It’s still based on fastai v0.7 (which we might change soon to v1). We deploy the whole thing to OpenShift and use a REST API for sending text data and get the classification scores back. It’s slow as the OpenShift instance is CPU-based, but we are trying to work around that.
Of course I would appreciate any feedback and comments, especially on if we can do better.
What are the benefits of using something like Starlette with async and await rather than e.g. Flask? My understanding of what async and await do is pretty fuzzy, and my team is more comfortable with Flask, so I’m wondering under what circumstances it makes sense to use Starlette.
I think it’s just more lightweight. If you are comfortable with Flask, go with Flask. I am currently experimenting with Responder, which is built on top of Starlette. Just because I love the APIs that Kenneth Reitz does
There’s also some potential performance advantage in using async and await, no? I’d like to know how that performance advantage works. (We will need fairly high throughput for a small number of consumers.)
Note, if you try this, fastai models trained <=1.0.44 won’t work on fastai >=1.0.46 so you need to be sure to use the right version of the docker. Additionally, dockerhub’s automated build system is broken at the moment, I have been manually building and pushing, but you may want to build your own.
Hello @nok.
Can you please tell me how did you push this app into production?
I know where the .ml domain comes from. I have a .tk one myself.
But how are you doing the predictions and everything with an UI on a .ml?
Are you forwarding it to another website?
I am stuck with production for my idea for weeks. A little help will be appreciated.
Thanks
Hello everyone. I have to productionize a PyTorch BERT Question Answer model. The CPU inference is very slow for me as for every query the model needs to evaluate 30 samples. Out of the result of these 30 samples I pick the answer with the maximum score. GPU would be too costly for me to use for inference.
Can I leverage multi-core CPU inference for this?
If Yes, what is the best practice to do so?
If No, is there a cloud option that bills me only for the GPU queries I make and not for continuously running the GPU instance?
In OpenVINO and Neural Compute Stick web sites, I found that the frameworks they support are ONNX, TF, Caffe and MXNet, but it does not mention Pytorch. To convert Pytorch -> ONNX -> OpenVINO, I found this workflow:
My question is, is this the same approach you followed to convert fastai model or is there any other approach?
Hi Anurag, do you have any Flask examples? I have a Flask app successfully running on my local host and trying to understand how to upload that to Render. Thanks.
I am trying to change model_dir of cnn_learner . Because ıf I use default path ,its warning about read-only file system at kaggle and I want to export it to tmp/models/ . How can ı change the path or am I doing wrong ?