Lesson 19 official topic

Laulito · October 17, 2023, 3:48pm

I think the term “iterative refinement” just wants to indicate that all the procedures are executed iteratively. Meaning that these kind of algorithms can’t deliver an approximation instantly. (e.g. like explained by Johno in Lesson 9A you can’t just subtract all the noise at once in a DDPM, you have to go step by step, iteratively refining the noisy images until, after the last iteration, the distribution of the final result will be as close to the sum of original data points as possible.)

The terms diffusion, stable diffusion and linear diffusion all have their origin in the DDPM paper discussed in the lesson. In all of these models the data is iteratively covered with white gaussian noise until the data finally diffuses completely resulting in just pure white gaussian noise. The generative process then starts by subtracting a bit of that pure noise step by step, refining the signal iteratively to generate a desired result as close to the sum of all data points that were used in training.

However it would also be possible to diffuse the data into something other than white gaussian noise, for example a so called gamma function (I’ve seen this in this paper: https://arxiv.org/pdf/2106.07582.pdf). So here the terminologies start to get really confusing because DDPMs always use white gaussian noise. So I guess you could go back in time to the original thermodynamics paper of 2015 that was kind of the inspiration for the 2020 DDPM paper. In the 2015 thermodynamics paper they classified this diffusion process by stating that it is “destroying” the original data distributions. So in a diffusion model your data always gets “destroyed” somehow. It could be white noise, pink noise, a mixture of noises, a gamma function. The basic idea is just to add so much of it to the original data until it diffuses into it.

However, Iterative refinement is a great terminology because it includes other models as well that do not destroy their data distributions. For example generative adversarial networks (GAN) or variational autoencoders (VAE) (Variational Autoencoder in TensorFlow (Python Code)). An variational autoencoder for example does not diffuse its data into noise or any other signal. To simplify one could say that an encoder inside a VAE is kind of compressing its data into a more compact representation of itself in form of a latent vector. The decoder then iteratively tries to refine that compressed data into a representation as close to the data that was being encoded. So the generative idea is similar but the data does not diffuse here in these models.

There are also a lot of other additional approaches and techniques out there that could be listed here as well. As it is a very young research field there will follow a lot of publications in the next few decades. They’ll maybe all have different implementations and are technically subclasses of something else but in order to form a superordinate class of all these procedures, the term iterative refinement is a very good fit I think.