Lesson 9B: Math of Diffusion

seem · October 13, 2022, 1:08am

Hey all! This is to discuss the supplementary Math of Stable Diffusion (“Lesson 9B”) part of Lesson 9.

This is a wiki topic - please feel free to add your favourite relevant links and chat about anything related to the math of stable diffusion.

<<< Lesson 8 ｜Lesson 9 | Lesson 9A | Lesson 10 >>>

Lesson resources

Lesson 9B video
Wasim’s lesson notes

Links from the lesson

More info

Deep Unsupervised Learning using Nonequilibrium Thermodynamics - original diffusion paper & main focus of this lesson
- Talk by Jascha Sohl-Dickstein (lead author) at ICML 2015
Denoising Diffusion Probabilistic Models - simplified diffusion models as denoisers with MSE loss
Denoising Diffusion Implicit Models
ELBO

barnacl · October 13, 2022, 4:49am

@seem i don’t think this is has been made a Wiki topic yet

Turkey · October 13, 2022, 1:42pm

It was mentioned that when getting the VAE latent embeddings, the constant 0.18125 was used to scale the latents in the original paper. Was there a reason this specific number was picked (i.e. it has some property), or was it more “we tried many values and this one seemed to work the best”?

ilovescience · October 13, 2022, 8:42pm

Here is an explanation directly from the lead author/developer of latent diffusion and Stable Diffusion:

We introduced the scale factor in the latent diffusion paper. The goal was to handle different latent spaces (from different autoencoders, which can be scaled quite differently than images) with similar noise schedules. The scale_factor ensures that the initial latent space on which the diffusion model is operating has approximately unit variance. Hope this helps

github.com/huggingface/diffusers

Explanation of the 0.18215 factor in textual_inversion?

opened 01:21AM - 09 Sep 22 UTC

closed 01:07PM - 09 Sep 22 UTC

garrett361

https://github.com/huggingface/diffusers/blob/b2b3b1a8ab83b020ecaf32f45de3ef2364…4331cf/examples/textual_inversion/textual_inversion.py#L501 Hi, just a small question about the quoted script above which is bothering me: where does this `0.18215` number come from? What computation is being done? Is it from some paper? I have seen the same factor elsewhere, too, without explanation. Any guidance would be very helpful, thanks!

appliedml42 · October 14, 2022, 1:33am

I have been grokking the math for DDPM paper too. I finally understand how to derive the error formulation!

The paper that has helped me the most is [2208.11970] Understanding Diffusion Models: A Unified Perspective. To further map the math to the code Denoising Diffusion Probabilistic Models (DDPM) was also super helpful.

Turkey · October 14, 2022, 2:35pm

Got it. So it’s determined from sampling the two encoders and determining the appropriate scaling factor to normalize between both. Super helpful!

appliedml42 · October 14, 2022, 9:24pm

The ELBO intuition in this paper is very solid. https://arxiv.org/pdf/1906.02691.pdf

wyquek · October 15, 2022, 4:48am

I like Understanding Diffusion Models: A Unified Perspective, although it took some time (and pain) to go through. The author went through every single line of math with some sort of annotation or explaination, without skipping any step. That helps to take away a lot of guesswork.

ilovescience · October 15, 2022, 10:00am

FYI I have a whole thread of curated resources delving into the math and intuition of diffusion models.

Since this thread was published there have been even more amazing resources so I probably may publish a new thread with some newer ones as well.

appliedml42 · October 15, 2022, 8:50pm

100%. This level of detail is not needed to train/inference on stable diffusion. However, this is the perfect resource for people who want to go deep and understand math fully.

jeremy · October 15, 2022, 10:40pm

The Lesson 9B video by @ilovescience and @seem is now available in the top post, and also here:

seem · October 20, 2022, 10:19pm

I just added this talk on the 2015 paper by Jascha Sohl-Dickstein (lead author) to the wiki, but wanted to highlight here since I think it’s great and I haven’t seen it mentioned before:

jeremy · November 30, 2022, 11:45pm

Here’s some helpful links about SDEs: