Lesson 9 (part 2) preview

Hi Alex, one solution is to pass the prompt_embs manually. E.g.

from diffusers import StableDiffusionPipeline
import torch

# 1. load model
model_id = "runwayml/stable-diffusion-v1-5"
pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16)
pipe = pipe.to("cuda")

# 2. Forward embeddings and negative embeddings through text encoder
prompt = 25 * "a photo of an astronaut riding a horse on mars"
max_length = pipe.tokenizer.model_max_length

input_ids = pipe.tokenizer(prompt, return_tensors="pt").input_ids
input_ids = input_ids.to("cuda")

negative_ids = pipe.tokenizer("", truncation=False, padding="max_length", max_length=input_ids.shape[-1], return_tensors="pt").input_ids                                                                                                     
negative_ids = negative_ids.to("cuda")

concat_embeds = []
neg_embeds = []
for i in range(0, input_ids.shape[-1], max_length):
    concat_embeds.append(pipe.text_encoder(input_ids[:, i: i + max_length])[0])
    neg_embeds.append(pipe.text_encoder(negative_ids[:, i: i + max_length])[0])

prompt_embeds = torch.cat(concat_embeds, dim=1)
negative_prompt_embeds = torch.cat(neg_embeds, dim=1)

# 3. Forward
image = pipe(prompt_embeds=prompt_embeds, negative_prompt_embeds=negative_prompt_embeds).images[0]
image.save("astronaut_rides_horse.png")

BTW, would you like to share your existing prompts dataset? hhha, I’m collecting prompts to build a prompt recommendation system for the community:)

Some fun portraits of my cat Astroid. The top & bottom right really capture his expression.

1 Like

Small bug as I’m watching the video for image to image.

Jeremy has his noise strength set to 1.0 (“oil painting of wolf howling at the moon by Van Gogh”). I believe this obliterates the original source image to complete noise, with 0, being no noise added. So it works, because it is effectively operating as text to image.

Turn the strength down to 0.7, and you get closer to what image to image should do, which is keep the basic shape of your input image, but reimagine it.

2 Likes

I have an 8 GB GPU and I was able to get the deep dive notebook mostly working with half precision.

The code is available at diffusion-nbs/Stable Diffusion Deep Dive.ipynb at master · vishakh/diffusion-nbs · GitHub.

The caveat is that the last cell still runs out of memory. :slight_smile:

I was wondering, during training of the Unet for the purposes of a Stable Diffusion Pipeline. What kind of augmentations can you still do on the latents?

I mean if you would train a regular Unet without AE, we add data augmentations such as flipping, rotating, scaling, contrast, brightness, blur, whatever. But does that also still work in the “latent representation” space in the case we do use an AE to compress our images down?

Great help here. I stuck at this one for a week already and notebook didn’t show the error. Just kill the kernel.
Found out about the error when I tried to use Ipython instead
export LD_LIBRARY_PATH=/usr/lib/wsl/lib FIXED my problem on WSL2 as you said

I also found this: Conda Pytorch (Pytorch channel) in WSL2 Ubuntu can't find libcudnn shared objects · Issue #85773 · pytorch/pytorch · GitHub
tracked: Convolutions are broken for PyTorch-2.0 CUDA-11.8 wheel builds · Issue #97041 · pytorch/pytorch · GitHub
fixed: /usr/lib/wsl/lib/libcuda.so.1 is not a symbolic link · Issue #5663 · microsoft/WSL · GitHub