On-device inference is slow - ideas on how to speed it up?

suhaild · January 13, 2019, 7:32pm

Hey folks,

My question is: What NN architecture would perhaps be narrower/shallower to do deep learning for a simple image classifier (ie: similar dogs vs. cats) than resnet34/50? I am looking to speed up predictions that are done via a CPU.

Background:

I’ve been spending a lot of time working with the PyTorch creator, Soumith, to get on-device inference working on an Android device. It’s been an uphill battle as much of the example code has bit-rot such that most of the online examples do not work anymore.

Fortunately, I’ve got working model export code & C++ code utilizing the Java NDK working as well. I can finally do inference!

The issue right now, however, is that inference takes 3.8s to run on one 256x256 image. I am currently using resnet34 architecture.

My suspicion is that a simpler architecture (while it might lead to lower accuracy) could lead to faster predictions. I am targeting something along the lines of 0.5s per prediction as that would give me 2 fps.

Any ideas?

More context:

Python 3
Using fastai 0.7
resnet34
Currently setting engine on exported .pb to NNPack engine to mobile optimize
Exporting via: PyTorch -> ONNX -> Caffe2
Using 256 x 256 images
Doing CPU-based inference (hard constraint)

utkb · January 14, 2019, 5:18pm

Would MobileNetV2 work?

suhaild · January 14, 2019, 11:12pm

Is that implemented in fastai 0.7?

utkb · January 15, 2019, 12:07am

Not out-of-the-box, but you can get one of the PyTorch implementations from Github (I can see quite a few), load the initialised PyTorch model and weights in, then cut off the last layer and add a “custom head” that matches the MobileNetV2 penultimate layer with your intended final layer (dependent on number of classes you are predicting). You can then create the learner object as before, from this model and your data.

I think these “cutting” and “custom head” operations were described in the fastai Part 2 course – can’t remember if they were used/introduced in Part 1 or not.

It’ll be interesting to know how resnet34 and MobileNetV2 compare with each other when used in fastai. Please do update us here with your progress (and questions!). Thanks.