Some interesting research on this topic:
Neural Scene Representation by Deepmind
Consistent GQN
Scene Representation Networks