Lesson 11 official topic

barnacl · November 1, 2022, 2:30am

I think the mask is being used similarly in both cases but one is in the image space and the other in latent space.
If you look at the johno’s notebook of converting a parrot to nat geo dancer, we see something similar. We were operating in latent space.
It was harder to grasp for me in that notebook with the prompt “A colorful dancer, nat geo photo”, so i replaced that with a “a panda sitting”

The very first image you see some color from the parrot in the panda.
This is what i think is happening

Fahim · November 1, 2022, 2:42am

I am not absolutely sure that the mask is used the same way I believe the example from johono’s notebook is because the images are progressively sort of mixed-in as they progress through the diffusion process. And I believe that’s what happens when you do the non-inpaiting for DiffEdit. Or rather, what I think is happening - you take only the parts of the image (in latent space) that are not masked and continue on the diffusion process for that part with the edited prompt. The masked part can either be diffused using the original prompt and/or replaced by the final latent result from the original prompt — depending on how you do things. (I’m just phrasing things here as I see how this might work — the paper probably has a specific way to do this but I don’t recall …)

For the inpainting, my intuition is that it completely replaces the editable portion of the masked image with the results of the edit prompt instead of mixing the edit prompt result with the existing image. Which is why you get completely white stripes on the zebra with this approach, for example …

But I do acknowledge that I might have this totally wrong

barnacl · November 1, 2022, 2:48am

Ah i agree with you about johno’s notebook not having masks, but what i was trying to say was when we are in latent space and mask things out that directly doesn’t translate to image space. In the inpainting case the masked out parts are lost. but in the latent space the the parts that are masked are not lost since the neighboring pixels in the latent space have information about it (due to compression)

PS:are we both saying the same thing ?

jeremy · November 1, 2022, 3:04am

For folks looking to improve their masks, you might find morphological operations useful:

https://docs.opencv.org/3.4/d9/d61/tutorial_py_morphological_ops.html

Fahim · November 1, 2022, 3:13am

I think we are saying the same thing

Fahim · November 1, 2022, 3:16am

Thanks, Jeremy I found the whole Python tutorials section on OpenCV very helpful/illuminating. I spent hours trying out different things but somehow missed the morphological operations …

johnri99 · November 4, 2022, 12:51pm

Following @jeremy 's advice I decided to try quarto. Seems really good, I published the notebook used above here: https://fromlittleacorns.github.io/fastai-course-22p/. Next step is to make it into a proper website so that I can have multiple notebooks etc. Seems a nice tool and got it working from JarvisLabs without any issues

aayushmnit · November 4, 2022, 5:45pm

@johnri99 Feel free to use my website code for styling or structuring - GitHub - aayushmnit/aayushmnit.github.io: Code for my website.

I have found looking at other website code built on quarto helpful while building my own website.

johnri99 · November 4, 2022, 7:20pm

Thanks @aayushmnit will certainly do so, your site looks very similar to what I had in mind

fredguth · November 5, 2022, 12:48am

Quarto lets you generate a html locally (from the jupyter nb) and you can post to github just the html.

AllenK · November 15, 2022, 4:56am

From code snippet to LATEX

latexify is a Python package to compile a fragment of Python source code to a corresponding LATEX expression.

marcossantana · November 16, 2022, 5:31pm

I was going through the lesson again and trying to read some papers from my field.
I came across this matrix notation and got stuck:

A ∈ RD|V|×2D|V|

To me it seems matrix A has shape (D x 2D) OR (V x V). Is this the right way to interpret it?

ababino · December 9, 2022, 2:32pm

Hi, I wrote this notebook implementing broadcasting from the foundations as an exercise. I thought I’d add some prose and turn it into a blog post using Quarto. Any feedback is welcome!

ababino · December 9, 2022, 9:15pm

maybe |V| is an integer number (probably the norm of a vector or the absolute value of an integer) therefore, D|V| should be read as the result of “D times |V|”. So D|V| is the number of rows, and 2D|V| is the number of rows. Same idea for the number of columns. Does it make sense?

jeremy · December 13, 2022, 3:04am

Very cool! I’d love to see a blogpost

manojmohan · April 23, 2023, 4:50am

The All in One Mathematics cheatsheet link is broken. What does it contain? Can somebody please share alternate link?

ForBo7 · May 11, 2023, 7:33am

I’m trying to implement the DiffEdit paper, and to do so, I need to add noise to the input image.

The following is how I’m doing it.

img = Image.open('/content/planet.png').resize((512, 512))

import torchvision.transforms as T
with torch.no_grad():
  lat = vae.encode(T.ToTensor()(img).unsqueeze(0).half().to('cuda')*2-1)
  lat = 0.18215 * lat.latent_dist.sample()

sched = LMSDiscreteScheduler(
    beta_start=0.00085,
    beta_end=0.012,
    beta_schedule='scaled_linear',
    num_train_timesteps=1000
)

sched.set_timesteps(15)

noise = torch.randn_like(lat)
ts = tensor([sched.timesteps[10]])
lat = sched.add_noise(lat, noise, timesteps=ts)

However, the last cell outputs the following error and I’m baffled as to why it’s occuring.

RuntimeError: a Tensor with 0 elements cannot be converted to Scalar

Full Traceback

---------------------------------------------------------------------------

RuntimeError Traceback (most recent call last)

[<ipython-input-24-8d2efdc445c5>](https://6umz6pprmf5-496ff2e9c6d22116-0-colab.googleusercontent.com/outputframe.html?vrz=colab-20230509-060147-RC00_530563781#) in <cell line: 3>() 1 noise = torch.randn_like(lat) 2 ts = 10 ----> 3 lat = sched.add_noise(lat, noise, timesteps=tensor([sched.timesteps[ts]]))

---
1 frames
---

[/usr/local/lib/python3.10/dist-packages/diffusers/schedulers/scheduling_lms_discrete.py](https://6umz6pprmf5-496ff2e9c6d22116-0-colab.googleusercontent.com/outputframe.html?vrz=colab-20230509-060147-RC00_530563781#) in add_noise(self, original_samples, noise, timesteps) 302 timesteps = timesteps.to(original_samples.device) 303 --> 304 step_indices = [(schedule_timesteps == t).nonzero().item() for t in timesteps] 305 306 sigma = sigmas[step_indices].flatten()

[/usr/local/lib/python3.10/dist-packages/diffusers/schedulers/scheduling_lms_discrete.py](https://6umz6pprmf5-496ff2e9c6d22116-0-colab.googleusercontent.com/outputframe.html?vrz=colab-20230509-060147-RC00_530563781#) in <listcomp>(.0) 302 timesteps = timesteps.to(original_samples.device) 303 --> 304 step_indices = [(schedule_timesteps == t).nonzero().item() for t in timesteps] 305 306 sigma = sigmas[step_indices].flatten()

RuntimeError: a Tensor with 0 elements cannot be converted to Scalar

I’ve thoroughly checked the tensors that are being used and none of them have 0 elements. I’ve also tried directly editing the add_noise method, but any changes I make to it don’t seem to be registering (e.g., doing a print statement causes the same error to be thrown, and the traceback says it’s occuring at that print statement); I’m doing this on Google Colab.

You can view the code on Colab here: Google Colab
The relevant code is under the “Add Noise to Image” header (there are two headers of the same name in the notebook; it’s the first one that is the relevant one).

I’d really appreciate help; I’m baffled.

ForBo7 · May 12, 2023, 9:29am

Figured out what was causing the runtime error when I decided to manually implement the sched.add_noise method.

The scheduler’s timesteps were of type float64. Yet when I tried to obtain the sigma at timestep 10, the scheduler returned, for some reason, a tensor of type float32.

The sched.add_noise method involves comparing all the timesteps inside the scheduler, and the specific timesteps you pass in to the sched.add_noise method (only timestep 10 in this case). A boolean tensor is then created, containing True for each specific timestep that exists inside the scheduler timesteps, and False for those that don’t. And from that, another tensor containing only True timesteps is created.

Because the scheduler’s timesteps and the timestep 10 had different precisions, a boolean tensor comprising purely of False values was created, despite timestep 10 existing in the scheduler’s timesteps (i.e. 0.00010002 ≠ 0.0001). Hence a tensor containing 0 elements was created.

To fix this, I set the precision of the scheduler’s timesteps to float32.

sched.timesteps = sched.timesteps.to(torch.float32)
ts = tensor([sched.timesteps[10]])

And the error was solved!

Man, it’s satisfying figuring out what’s happening and managing to solve it heh.

ForBo7 · May 18, 2023, 12:54pm

I need some help on how to properly apply the generated mask during the denoising phase of DiffEdit.

If I’m understanding the steps of DiffEdit correctly, the mask is applied during each denoising step — meaning the background pixels of the original image are added to the latent at each denoising step.

However, when I try to implement this, I get the following result (I’m replacing a horse with a zebra).

This is how my denoising loop looks like.

prompt = ['zebra']
img = Image.open('/content/img.png').resize((512, 512))
embs = get_embs(prompt, neg_prompt)
lat = get_lat(img)
inv_mask = 1 - mask
back = torch.mul(F.to_tensor(img).permute(1, 2, 0), torch.from_numpy(inv_mask))


for i, ts in enumerate(tqdm(sched.timesteps)):
  if i >= start_step: 
    lat = denoise(lat, ts)
    back = get_lat(Image.fromarray((back*255).numpy().round().astype(np.uint8)), start_step=i)
    back = decompress(back)
    fore = torch.mul(torch.from_numpy(decompress(lat)), torch.from_numpy(mask))/255
    lat = compress_img(Image.fromarray(((fore+(backn/255))*255).numpy().round().astype(np.uint8)))

back is the background pixels, which I obtain by inverting my mask and applying it to the original image.

fore is the pixels comprising the zebra, which I obtain by decompressing the latent and applying the mask to it.

I then obtain the final latent by adding fore + back together, and then compressing it for the next loop.

I’d really appreciate some help. If you need any more information to do so, please do let me know.

ForBo7 · May 23, 2023, 6:20am

In this lesson, it’s mentioned that the reason why the “Background” section in papers is included is to impress the reviewers. Reminds me of the following meme .