Hey guys, I’ve been working on this for the last few weeks!
This Colab partially reproduces the paper “Stereo Magnification: Learning View Synthesis using Multiplane Images” from a Google team in 2018. It is based on the code they published on github as well as the RealEstate10K dataset they also published.
I’ve ported the code to PyTorch, and created a reduced version of the dataset, which in combination with lowering the resolution and number of planes allows us to train a model from scratch in this notebook in less than 15 minutes. This model will be able to create Multi-Plane Images where we can see desirable characteristics such as the segmentation of planes by depth.
The original code was written in Tensorflow 1.X which is outdated, so I saw porting it to pytorch as an opportunity to both learn torch and give some value to the community. Also using the original dataset is not very easy, as only the metadata is provided and you need to write the code and run it to both download the (77k) youtube videos and capture the specific frames.
Also thanks for the awesome help I got in this thread, it helped me in fixing the issues I had.