I am currently working on a small toy-project that involves object detection as one of the steps.
Currently, I amusing a pre-trained Faster-RCNN from Detectron2 with ResNet-101 backbone.
I wanted to make an MVP and show it to my colleagues, so I thought of deploying my model on a CPU machine. Detectron2 can be easily converted to Caffe2 (DOCS) for the deployment.
I measured the inference times for GPU and the CPU mode. The inference time of the original Detectron2 model using PyTorch and GPU is around 90ms on my RTX2080 Ti.
The converted model on CPU (i9 9940X) and using Caffe2 API took 2.4s. I read that the Caffe2 is optimized for CPU inference, so I am quite surprised by the inference time on CPU. I asked about this situation on the Detectron2 GitHub and I got an answer like: „Expected inference time of R-50-FPN Faster R-CNN on a 8 core CPU is around 1.9s. Usually, ResNets are not used on CPUs.”
There is my question, how such deep learning solutions for Computer Vision should be deployed in the real-world? I read somewhere that Facebook does use Caffe2 for their production models as CPUs are super cheap compared to GPUs (of course they are), but the difference in the running time is really huge. Using CPU for object detection seems useless for any real-time application.
Should I use some other architecture, which does not include ResNet or Faster-RCNN (like YOLO v4/v3, SSD, etc.)? Or maybe the original GPU-trained model should be converted to ONNX and then used in other more CPU-optimized frameworks such as OpenVINO? Or there are some other tweaks such as quantization, pruning, etc. that are necessary to boost the CPU-inference efficiency of production models?
I know that this is just a toy-project (for now at least), I can use GPU for inference (quite costly in real applications?) or just use other architecture (but sacrifice performance). I am just wondering what is the go-to solution for real-world systems.
Thanks in advance for sharing your knowledge and experience. I will be grateful for any hints!