Accelerating neural network inference on phone

I wanted to build a style transfer model that runs on phone in real time (20 fps). Since the model (based on the “Perceptual losses for real time style transfer” paper: https://arxiv.org/abs/1603.08155) involves only convolutional layers. It is pretty slow to run on device. I was looking for a way to accelerate the inference time of convolutional layers. For that I came across this paper: https://arxiv.org/abs/1505.06798
It speeds up neural network inference by 4 times, with only 0.3% drop in accuracy. I am still looking for more techniques. Can anyone help me out?

@singlasahil14

Hey,

  1. Facebook released caffe2, which was designed to speed up neural network on mobile devices ( https://caffe2.ai/ ). I think they announced that it is 10x faster than current frameworks on the f8 conference. I cannot find the exact video right now
  2. I think tensorflow supports different functionality as well:
    https://www.youtube.com/watch?v=0r9w3V923rk
    https://www.youtube.com/watch?v=kAOanJczHA0

Bests,
Benedikt