Accelerating neural network inference on phone

I wanted to build a style transfer model that runs on phone in real time (20 fps). Since the model (based on the “Perceptual losses for real time style transfer” paper: involves only convolutional layers. It is pretty slow to run on device. I was looking for a way to accelerate the inference time of convolutional layers. For that I came across this paper:
It speeds up neural network inference by 4 times, with only 0.3% drop in accuracy. I am still looking for more techniques. Can anyone help me out?



  1. Facebook released caffe2, which was designed to speed up neural network on mobile devices ( ). I think they announced that it is 10x faster than current frameworks on the f8 conference. I cannot find the exact video right now
  2. I think tensorflow supports different functionality as well: