Fastai V2 - Inference using get_preds instead of predict

Hi all,

Thanks everyone for all the amazing work + community support around FastAI. I’m curious if anyone’s attempted to use get_preds for running inference at scale and combining it with micro batching to improve parallelization? Instead of .predict…

I’ve got some very simple code:

print("Loading model...")
model = load_learner('exported-model.pkl')
print("Model loaded!")
# Additional FastAPI code, including the route
# @app.post('/api', response_model=ResponseBody)
# async def predict(item: Item):
prediction = model.predict(string)

Which works alright - around 800ms response time but when I start to hit it at scale, it slows down to around ~15 secs. My hope is to use get_preds to run predictions in bulk but I’m unfortunately getting stuck trying to figure out how to use the imported model and get similar results to .predict.

Any suggestions or thoughts?