Hey folks,
My question is: What NN architecture would perhaps be narrower/shallower to do deep learning for a simple image classifier (ie: similar dogs vs. cats) than resnet34/50? I am looking to speed up predictions that are done via a CPU.
Background:
I’ve been spending a lot of time working with the PyTorch creator, Soumith, to get on-device inference working on an Android device. It’s been an uphill battle as much of the example code has bit-rot such that most of the online examples do not work anymore.
Fortunately, I’ve got working model export code & C++ code utilizing the Java NDK working as well. I can finally do inference!
The issue right now, however, is that inference takes 3.8s to run on one 256x256 image. I am currently using resnet34 architecture.
My suspicion is that a simpler architecture (while it might lead to lower accuracy) could lead to faster predictions. I am targeting something along the lines of 0.5s per prediction as that would give me 2 fps.
Any ideas?
More context:
- Python 3
- Using fastai 0.7
- resnet34
- Currently setting engine on exported .pb to NNPack engine to mobile optimize
- Exporting via: PyTorch -> ONNX -> Caffe2
- Using 256 x 256 images
- Doing CPU-based inference (hard constraint)