I believe @johnowhitaker mentions in the Deep Dive video that the original model was trained using images set up that way and so we have to follow the same procedure to stay consistent with how the model was trained … But I’d have to look up the video to be certain since I’m going by memory
Update: I’m watching the Lesson 10 video and Jeremy explains this at around the 46:00 mark. I was wrong about the calculation you pointed to and what Johno said. (What Johno mentioned was the multiplication and later division by 0.18215) Apparently the VAE will output values in the range of -1 to 1 and the calculation you pointed is to convert those values to a range of 0 to 1 as the Python Imaging requires.
The decoded image is represented as floats between -1 and 1. That line goes to (0, 1). Then we rearrange the channels, multiply by 255 and turn it into ints to get it in the format expected by something like PIL. So just fluff for converting between ways of representing an image