Hi, I’m still quite new to deep learning. I’m trying to reproduce a paper on light-fields: Stereo Magnification: Learning View Synthesis using Multiplane Images. The paper has the code available but it’s using tensorflow so I thought it would be a good project as I’m also interested in light-fields/view synthesis.
I’ve been quite successful in the sense that I’ve got a colab that trains the model and it seems to work to a certain degree.
The problem is that training each epoch is taking some 30 minutes, and I wonder if you guys have some ideas on how to improve the speed of the training (maybe a smaller model?). The model is just as defined in the paper, and it turns out to be a 4.5 million parameters, with 18 convolution layers.
I’ve actually tried to create a smaller model with just 7 layers and 1.5 million parameters but it’s still as slow. Furthermore, for some reason I’m only able to use a batch size of 1, as bigger batch sizes trigger out-of-memory issues.
Notice that, effectively, the input of the model is 11 images and the output is 10 images: the Multi-Plane Image which can be used to synthesize new images from different points of view simply using alpha-composition. However the model is only about 18MB, and 10 to 11 images at a resolution of 220x220 is not that much, or is it?