How to create image to image using stable diffusion manually

yanivbl · September 1, 2024, 5:53pm

I am trying to use DIY loop to generate an image from a base image. For some reason the result is a blurry image that doesn’t look like anything. Can you help me understand where I went wrong?

import torch
from diffusers import DDIMScheduler
from PIL import Image
import numpy as np

# Load and preprocess the image
def preprocess_image(image_path, target_size=(256, 256)):
    image = Image.open(image_path).convert('RGB')
    image = image.resize(target_size)
    image = np.array(image) / 255.0
    image = torch.from_numpy(image).permute(2, 0, 1).unsqueeze(0).float()
    return image

# Initialize the DDIMScheduler
scheduler = DDIMScheduler(
    num_train_timesteps=1000,
    beta_start=0.00085,
    beta_end=0.012,
    beta_schedule="scaled_linear"
)

# Set the number of inference steps
num_inference_steps = 50

# Load and preprocess your image
image_path = "/kaggle/input/imageforlatent/Untitled.jpg"
image = preprocess_image(image_path)

#Setting the timesteps
timesteps = scheduler.timesteps[::len(scheduler.timesteps) // num_inference_steps]
timesteps = timesteps[:num_inference_steps]  # Ensure we have exactly num_inference_steps



# using the scheduler  adding noise to initial image 
# Generate noise
noise = torch.randn_like(image)
#Adding noise using loop
for step,t in enumerate(tqdm(timesteps.flip(0))):

    # Add noise
    noisy_image = scheduler.add_noise(image, noise, t)
    
    # Save the image every 10 steps and at the final step
    if step % 10 == 0 :
        noisy_image_np = noisy_image.squeeze().permute(1, 2, 0).clamp(0, 1).numpy()
        plt.imshow(noisy_image_np) 
        plt.show()



noisy_image = noisy_image.to(device)

# Encode to latent space
with torch.no_grad():
    latents = 0.18215 * pipe.vae.encode(noisy_image).latent_dist.mean



# DIY loop to generate an image from the noisy image

guidance_scale = 8  # @param
num_inference_steps = 50  # @param
prompt = "a man and a moon"  # @param
negative_prompt = "zoomed in, blurry, oversaturated, warped"  # @param

# Encode the prompt
text_embeddings = pipe._encode_prompt(prompt, device, 1, True, negative_prompt)

#####USING OUR OWN LATENT
# Create our random starting point
#latents = torch.randn((1, 4, 64, 64), device=device, generator=generator)
#latents *= pipe.scheduler.init_noise_sigma

# Prepare the scheduler
pipe.scheduler.set_timesteps(num_inference_steps, device=device)

# Loop through the sampling timesteps
for i, t in enumerate(pipe.scheduler.timesteps):

    # Expand the latents if we are doing classifier free guidance
    latent_model_input = torch.cat([latents] * 2)

    # Apply any scaling required by the scheduler
    latent_model_input = pipe.scheduler.scale_model_input(latent_model_input, t)

    # Predict the noise residual with the UNet
    with torch.no_grad():
        noise_pred = pipe.unet(latent_model_input, t, encoder_hidden_states=text_embeddings).sample

    # Perform guidance
    noise_pred_uncond, noise_pred_text = noise_pred.chunk(2)
    noise_pred = noise_pred_uncond + guidance_scale * (noise_pred_text - noise_pred_uncond)

    # Compute the previous noisy sample x_t -> x_t-1
    latents = pipe.scheduler.step(noise_pred, t, latents).prev_sample

# Decode the resulting latents into an image
with torch.no_grad():
    image = pipe.decode_latents(latents.detach())

# View
pipe.numpy_to_pil(image)[0]

I tried the code above but it generated a blurry image

thomas642daniel · September 6, 2024, 11:16am

yanivbl:

I am trying to use DIY loop to generate an image from a base image. For some reason the result is a blurry image that doesn’t look like anything. Can you help me understand where I went wrong?

import torch
from diffusers import DDIMScheduler
from PIL import Image
import numpy as np

# Load and preprocess the image
def preprocess_image(image_path, target_size=(256, 256)):
    image = Image.open(image_path).convert('RGB')
    image = image.resize(target_size)
    image = np.array(image) / 255.0
    image = torch.from_numpy(image).permute(2, 0, 1).unsqueeze(0).float()
    return image

# Initialize the DDIMScheduler
scheduler = DDIMScheduler(
    num_train_timesteps=1000,
    beta_start=0.00085,
    beta_end=0.012,
    beta_schedule="scaled_linear"
)

# Set the number of inference steps
num_inference_steps = 50

Hello,

I read your question, you’re generating an image using the DDIMScheduler, but the resulting image is blurry and doesn’t look right. Here are a few areas to check and some suggestions to improve your image generation process:

Potential Issues and Fixes
Check the Image Preprocessing: Ensure that the preprocessing step converts the image into the correct format expected by your model or scheduler. Here are a few things to verify:

Make sure the image normalization matches the expected input range for the model.
Confirm that the target size of (256, 256) is appropriate for the model you are using.
Ensure Correct Initialization: The DDIMScheduler parameters should match what’s required by the specific implementation you are using. Verify the initialization of the DDIMScheduler with the documentation or source code for the scheduler you are working with.

Generate Image with the Model: You haven’t included the part where you use the model to generate the image from the base image. Ensure that you are using the DDIMScheduler correctly to generate the image. For most diffusion models, you typically need a pre-trained model for this.

Model and Scheduler Usage: You need a model (e.g., a pretrained diffusion model) that takes the initial image and performs the diffusion process. Make sure you have a model that matches the DDIMScheduler and that it is used correctly. Here’s a general outline of how you might structure the image generation code:

import torch
from diffusers import DDIMScheduler, UNet2DConditionModel # Assuming you have a model class
from PIL import Image
import numpy as np

Load and preprocess the image

def preprocess_image(image_path, target_size=(256, 256)):
image = Image.open(image_path).convert(‘RGB’)
image = image.resize(target_size)
image = np.array(image) / 255.0
image = torch.from_numpy(image).permute(2, 0, 1).unsqueeze(0).float()
return image

Initialize the DDIMScheduler

scheduler = DDIMScheduler(
num_train_timesteps=1000,
beta_start=0.00085,
beta_end=0.012,
beta_schedule=“scaled_linear”
)

Load a pretrained model (assuming a model class exists)

model = UNet2DConditionModel.from_pretrained(‘path/to/pretrained/model’)

Prepare the image

image_path = ‘path/to/image.jpg’
init_image = preprocess_image(image_path)

Move to device if using CUDA

device = torch.device(‘cuda’ if torch.cuda.is_available() else ‘cpu’)
model.to(device)
init_image = init_image.to(device)

Generate new image

num_inference_steps = 50
generated_image = init_image # Start with the initial image

with torch.no_grad():
for step in range(num_inference_steps):
# Perform the diffusion step
noise = torch.randn_like(generated_image)
# Forward pass through the model (this is a simplified example)
generated_image = model(generated_image + noise) # Replace with actual model call
generated_image = scheduler.step(generated_image) # This is a placeholder

Convert the image back to PIL format

generated_image = generated_image.squeeze().permute(1, 2, 0).cpu().numpy()
generated_image = (generated_image * 255).astype(np.uint8)
generated_image = Image.fromarray(generated_image)

Save or show the image

generated_image.save(‘generated_image.jpg’)
generated_image.show()

Check Scheduler and Model Compatibility: Ensure that the scheduler and model you are using are compatible. If you’re using a specific type of model with DDIMScheduler, check the documentation for any particular details or steps needed for correct usage.

Hope that helps!

yanivbl · September 14, 2024, 12:29pm

Many thanks for the detailed explanation Thomas. I will try to use this to fix my code.