Fastai serving

drewbo · July 16, 2019, 5:52pm

We’ve been trying to improve the way we run fast.ai inference at a large scale and just released fastai-serving, a tool for creating docker images to run in the style of TensorFlow Serving. Would love to know if this is helpful for anyone looking to do large scale inference or if you have any questions or feedback. In general, we agree with @jeremy’s advice that CPU inference is more appropriate depending upon the scale of your problem but this repo can be helpful when doing predictions over large batches of imagery.

Thanks to @sgugger for answering so many questions on the forum, particularly around the topic of production inference. And thanks to @anurag for providing an example API that we borrowed quite liberally from.

jeremy · July 16, 2019, 8:41pm

Thanks for sharing! Looking forward to hearing how folks go with trying this.

(One extremely minor suggestion - call it “fastai serving” instead of “Fast.ai serving”, and refer to “fastai models” instead of “fast.ai models”, since the software library is called “fastai” (the company is called “fast.ai”.)

drewbo · July 16, 2019, 9:26pm

Thanks for the feedback @jeremy, I’ve updated the repo to reflect that distinction.

wgpubs · July 17, 2019, 7:04pm

This looks really great.

In the process of determining how to deploy a number of NLP models that will require GPU currently and can seem something like this being of use. Do you have any examples of deploying this to AWS? Any lessons learned and/or best practices?

drewbo · July 18, 2019, 3:16am

Thanks @wgpubs!

Our largest scale deployments are done on AWS using chip-n-scale: queue arranger, another open source tool that we developed specifically for running satellite imagery inference at large scale. Our motivation for developing that framework was:

Imagery download (or I/O) speed can be an inference bottleneck, sometimes bound by CPU. Parallelizing across lambda functions can help solve for this.
Once you have control over your imagery input speed, you can fine tune this parameter to keep your GPUs maximally utilized and thus, not waste money on time spent keeping them reserved (or saving time if you have a dedicated machine)

My colleague has a more detailed explanation of the project that led us to this style of workflow (though in that case, we were using TensorFlow rather than fastai).

Let me know if this helps at all and hope your projects go well.