What architecture for image sequence to image sequence generation?

I’m working on a problem which involves taking a sequence of satellite images of an area and generating a sequence of new images predicting the scene a few days out. I am still new to deep learning and am unfamiliar with what models are best for this task.
My intuition is to use an RNN architecture as I have seen in the lectures. I’ve read a bit about Pixel RNN and am wondering if it might be a good fit?