Slower inference

I was able to train a text classification model(on top of wiki) following chapter 8 and put that model on a AWS Lambda but the inference is just too slow each inference is approx 400 mili second. inference on my SageMaker notebook instance with GPU is also similar. I must be doing something wrong. Any idea ?
My model is 175MB

How are you loading the model in? If it’s with load_learner, make sure the cpu flag is set to false

1 Like

It’s with load_learner, Thanks let me try that

not much improvement on AWS lambda, @muellerzr better question would be is there a way to include gpu on AWS Lambda containers.