I need some help on how to properly apply the generated mask during the denoising phase of DiffEdit.
If I’m understanding the steps of DiffEdit correctly, the mask is applied during each denoising step — meaning the background pixels of the original image are added to the latent at each denoising step.
However, when I try to implement this, I get the following result (I’m replacing a horse with a zebra).
This is how my denoising loop looks like.
prompt = ['zebra']
img = Image.open('/content/img.png').resize((512, 512))
embs = get_embs(prompt, neg_prompt)
lat = get_lat(img)
inv_mask = 1 - mask
back = torch.mul(F.to_tensor(img).permute(1, 2, 0), torch.from_numpy(inv_mask))
for i, ts in enumerate(tqdm(sched.timesteps)):
if i >= start_step:
lat = denoise(lat, ts)
back = get_lat(Image.fromarray((back*255).numpy().round().astype(np.uint8)), start_step=i)
back = decompress(back)
fore = torch.mul(torch.from_numpy(decompress(lat)), torch.from_numpy(mask))/255
lat = compress_img(Image.fromarray(((fore+(backn/255))*255).numpy().round().astype(np.uint8)))
back
is the background pixels, which I obtain by inverting my mask and applying it to the original image.
fore
is the pixels comprising the zebra, which I obtain by decompressing the latent and applying the mask to it.
I then obtain the final latent by adding fore + back
together, and then compressing it for the next loop.
I’d really appreciate some help. If you need any more information to do so, please do let me know.