Lesson 10 official topic

SHAR1 · October 18, 2022, 8:32am

How is the “padding prompt” useful ?

kurianbenoy · October 18, 2022, 8:38am

Imagic: Text-Based Real Image Editing with Diffusion Models|

ganesh.bhat · October 18, 2022, 8:41am

The first step of creating an optimized embedding is not clear. Arent we creating an embedding with CLIP which is input to the pre-trained diffusion model to generate the image? Also, what do they mean when they say optimized.

johnowhitaker · October 18, 2022, 8:45am

You can take the embeddings from the clip text encoder set up an optimizer to modify them. You use them to ‘denoise’ the image, see how well it did, then update the embeddings accordingly. The idea is that this gives a new set of embeddings that, when fed to the unet, end up generating images that look more like the input example.

seem · October 18, 2022, 8:46am

The Hugging Face diffusers notebook for Dreambooth fine-tunes Stable Diffusion, so might be a good starting point for implementing a SD version of Imagic: GitHub, Colab.

miko · October 18, 2022, 8:47am

So this is yet another instance where we freeze the parameters and fine tune the input. Except that this time we freeze the image as well and optimize the text embeddings, while with normal inference we freeze the text embeddings and optimize the image latents

mason_dh · October 18, 2022, 8:49am

How does the color/type of noise affect the results? Is Gaussian explicitly required due to how the model was trained?

ilovescience · October 18, 2022, 8:50am

Yes, Stable Diffusion was trained with Gaussian noise, but recent research suggests you could train with other types of noise as well, with varying levels of success.

ste · October 18, 2022, 9:04am

Noise is added on VAE latent codes (64x64 image of 4 channels) - there is no concept of color there: the pixels values in that 4 channels are just “semantics” learned by VAE.

Picture is from accompanying notebook from @johnowhitaker

vugg · October 18, 2022, 9:06am

I noticed before the sampling loop the latent is multiplied with scheduler.init_noise_sigma, but there is also scheduler.scale _model in each iteration. What is multiplication with init_noise_sigma for?

SHAR1 · October 18, 2022, 9:15am

(Sander Dieleman’s blog on Guidance)

radikubwa · October 18, 2022, 9:24am

From the notebook it is used for scaling the latents. The second one with scale model implements another formula as you can see latent_model_input = latent_model_input / ((sigma**2 + 1) ** 0.5)

johnowhitaker · October 18, 2022, 9:44am

If you take an image and add lots of noise (equivalent to the highest ‘timestep’ during training) you’ll get a result with a standard deviation of ~14 (the max sigma value used during training). Whereas torch.randn gives something with std 1. So, we scale by sigma_max (aka init_noise_sigma) to get something that looks more like the noisiest images the model say during training.

Now the model inputs are not the raw noisy latents - they are a scaled version. Just a choice from the designers. So we get a second scaling bit to get the actual model inputs, which is handled by scheduler.scale_model_inputs (if I’m remembering the function name right).

AllenK · October 18, 2022, 9:50am

APL : Array Programming topics. Array programming - fast.ai Course Forums

strickvl · October 18, 2022, 9:51am

Could one implement everything in part 2 in APL or one of the array programming languages? Would we hit GPU support issues soon? Maybe with MNIST it’t be possible?

jeremy · October 18, 2022, 10:12am

Yes that should be fine!

debashish · October 18, 2022, 10:13am

Thanks for the lesson. Building from scratch is awesome.
I had a question regarding the random number generator issue with PyTorch and NumPy. What was the issue with have the same random number generated in context of DL?

jeremy · October 18, 2022, 10:13am

If two processes of a DataLoader are generating the same random numbers, then they’re generating the same “randomly” augmented images!

bipin · October 18, 2022, 12:30pm

Wow … @Justinpinkney just released the imagic notebook that uses stable diffusion:

github.com

justinpinkney/stable-diffusion/blob/main/notebooks/imagic.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "0aac8d73-e407-4532-830a-19ace337a5ef",
   "metadata": {},
   "source": [
    "# Imagic Stable Diffusion\n",
    "\n",
    "Implmentation of _Imagic: Text-Based Real Image Editing with Diffusion Models_ using Stable Diffusion\n",
    "\n",
    "![](../assets/Imagic.jpeg)\n",
    "\n",
    "This implmentation requires a GPU with ~30GB of VRAM, I'd recommend an A100 from [Lambda GPU Cloud](https://lambdalabs.com/service/gpu-cloud) which will take a little over 5 minutes to process a single image. \n",
    "\n",
    "Make sure you have downloaded the appropiate checkpoint for Stable Diffusion from huggingface and set up your environment correctly. (There are instructions for both in many other Stable Diffusion repos so please Google it if you're not sure.)\n",
    "Note there's plenty of room for optimisation on memory usage and training parameters (this is just a quick guess based on the paper, which doesn't have many details). So please experiment and let me know how it goes!\n",
    "\n",
    "Written by [Justin Pinkney](www.justinpinkney.com)([@Buntworthy](https://twitter.com/Buntworthy)) @ [Lambda Labs](https://lambdalabs.com/)."
   ]

This file has been truncated. show original

brismith · October 18, 2022, 12:53pm

See Stable diffusion: resources and discussion - #40 for some tips for avoiding CUDA memory issues