Stable diffusion: resources and discussion

I haven’t been able to get the notebooks working with the M1 - just getting noise (also issues with float64). Using cpu worked ok - but the speed dropped to about 2:40 per image. I switched over to my old box, a Linux machine with an 1080TI and the default dream.py images are produced in around 11 seconds - so I might see if I can get that working with the notebooks.

I could run all of the nb (except the last cell) with a linux 1080 8G and avoid a cuda error if I reduce the width by half

height = 512                    
width = 256  

Didn’t get the same image as in the original nb, but pretty decent “half” images

2 Likes

Thanks - and I saw the same memory issues. The deep dive notebook also runs well on the 1080TI with some kernel restarts ;).

1 Like

Sharing this video suggested by Tanishq on Twitter. Nice overview of SD although he does leave out some details, like adding noise on the encoded output by the VAE and not on the original image.
Overall nice video.

3 Likes
4 Likes

it’s possible to run all cells at one go on Nvidia 1080 8G with some minor modifications to the nb, mostly consisting of splitting up some large cells into smaller ones

Also, the smaller images look better if you use landscape height=256 and width=512

1

2

4 Likes

I was curious about training diffusion models for generating text, so have put together a minimal implementation here. This implementation can be used to train unconditional generative model of text, and also includes a small implementation of classifier guidance for conditional generation. Will be happy to answer questions about the implementation or diffusion models in general! Also includes a denoising/generative sampling loop visualization:

minimal-text-diffusion

9 Likes

Just read this paper yesterday. The idea of treating the “noise” that you use in diffusion models as a latent vector is a very surprising but powerful concept in my opinion. Lots to explore there…

3 Likes

Another interesting application of diffusion to create different 3D views of an object from a single image + the object’s pose:
https://3d-diffusion.github.io/

2 Likes

Would be nice to generate a mesh from a 3d point cloud input with SD

What’s the current SOTA resource for stable diffusion with respect to inference speed? Is there a resource somewhere that tried to keep up with “SOTA” versions of SD for things like inference speed, memory usage, etc?

This Computerphile video is good:

4 Likes

Really good math explanation of diffusion models:

2 Likes

I have a plan to work on text generation. Glad to find your work as a starting point. I see a list of references in your repo. Which ones do you recommend to read first for the uninitiated?

1 Like

[2205.14217] Diffusion-LM Improves Controllable Text Generation might be a good starting point, but there are a ton of caveats that come with any approach that manipulates latent space for text generation (GANs/VAEs/Normalizing flows).

I plan to add a blog/tutorial in the next week or so as well.

Good luck and I’m looking forward to knowing more about what you find/come up with!

2 Likes

Ha! The video mentions the “enhance” meme. Only yesterday on the drive to work I was thinking… the awful thing about this ML denoising and generative upsizing, is that it belies all my previous vocal criticism of TV memes where technology can super-“enhance” poor security footage.

1 Like

Random thought: Learning about Stable Diffusion, I’m surprised how accurately it resembles Michelangelo’s famous quote “The sculpture is already complete within the marble block, before I start my work. It is already there, I just have to chisel away the superfluous material.”. It can be paraphrased as “The image is already complete before I start my work. It is already there, I just have to remove the superfluous noise.” :crazy_face:

6 Likes

There are two new VAEs out from stability. These don’t affect the form of the image, only the final decode step where it scales up from the 64x64 latents to a 512x512 image. The new decoders give much better results, when it comes to details such as small faces, eyes, and letters.

5 Likes

replace this line:
vae = AutoencoderKL.from_pretrained("CompVis/stable-diffusion-v1-4", subfolder="vae")
with:
vae = AutoencoderKL.from_pretrained("stabilityai/sd-vae-ft-ema")

4 Likes

Or stabilityai/sd-vae-ft-mse, which has been trained for more steps and seems to give better results.

3 Likes