Neural Networks for Stereo to Multiview View Synthesis

(Nic Dahlquist) #1

I took part 1 and 2 last year, and wanted to share my experiences applying these skills in industry. I’m currently working at a startup called Leia, which makes glasses-free 3D displays (technically, automultiscopic displays). One challenge is that there is a lot of movies available in stereo 3D, but we would like to be able to render “in-between” views for better user experience.

When I started on this project, we were using a non-machine-learning-based approach that took about 1 second per frame on a smartphone CPU. My idea was to train a convnet to replace this code, and run it on the smartphone’s GPU for speedup.

Some things that proved helpful:

  • W-net to allow for wide receptive field with good output resolution.
  • Separable convolutions to make the model smaller and faster.
  • The “selection layer” from Deep3D.
  • Perceptual loss
  • Network is fully convolution, so we can run it on different image aspect ratios.

To run on the GPU, we used Qualcomm’s Snapdragon Neural Processing Engine. It includes some scripts that convert a trained Tensorflow/Keras model to their proprietary format, and then an Android SDK that can run the converted model on CPU, GPU, or DSP. GPU gave us a huge performance boost over CPU (we are able to run the network in real-time on a live video stream).

It has been very exciting to ship my first production project using neural networks (and also the first for our company!). We hope to make many improvements, and have plenty of ideas for other applications of neural networks for 3D content. If this sounds interesting to you, please contact me at nicolas.dahlquist [at] Thank you for reading :slight_smile: