Lesson 10 official topic

I reran that experiment (with more variations to see what happens) and the results are captured in my notebook here.

https://github.com/johnrobinsn/diffusion_experiments/blob/scenario1/TreeDiffusion_scenario1.ipynb

It didn’t resolve the issue… but a few more observations. Using this blending approach…

0.9 * baseline_noise + 0.1 * new_noise

With small amounts of new noise I still see the degenerate behavior… But adding in larger amounts of new noise (>0.4) I start seeing a “diffuse pattern” emerge as follows.

Looking at this I suspected that the noise was no longer following a normal distribution N(0,1). Indeed the following doesn’t have a std dev of 1.

c = torch.randn(100000)*0.9+torch.randn(100000)*0.1;c.mean(),c.std()
(tensor(0.0074), tensor(0.9041))

It appears that since the samples are not uniformly distributed you can’t use a linear interpolation… So I tried this… which probably still isn’t quite right but it does give me something very close to a std dev of 1 with small amounts of update_noise

noise = baseline_noise+torch.randn_like(im_latents)*update_noise

The results of this are also captured in the notebook. But I’m still seeing the degenerate behavior.

Is there a better way to add in small amounts of additional noise and preserve the required noise properties (a valid gaussian sample)?

Thanks Much…
John

4 Likes

I also added a pdf render of this notebook so you can review my results easily without rerunning the notebook… github is timing out on rendering the notebook intermittently since the notebook has gotten a bit large.

2 Likes

I was following along lesson working on another dataset and realised that it’s a bit painful to load raw image bytes to multidimensional arrays(lists) purely in Python (obviously) without using libs like PIL & numpy.

After struggling a bit, I got curious about the datatype of the pickle object that we load in the lesson. Apparently, it’s of the type numpy.ndarray. We’re not really using numpy API after loading the data, so I guess it’s fine with the ground rules we’ve set(on not using numpy apis until we’ve sorta recreated them).

Either ways, I had a hard time loading PNGs directly to multidimensional lists, so I’m going to cut myself some slack and use PIL.Image and numpy.asarray for loading up the data. Just this one time. :sweat_smile:

3 Likes

Yes exactly - that was my theory anyways… Whilst I think it would be interesting and instructive to write a jpg or png decoder from scratch, it does feel rather out of scope!

2 Likes

I’m glad you tried that. I’ve been wondering about the same thing.

What if you scale the entire u + g*(t-u) instead of just the g*(t-u) bit?

BTW I tweeted your post here:

4 Likes

Thanks a lot for the tweet @jeremy !

Very interesting! so I’ve tried 2 things (hope they include what you had in mind): the bottom line is that your idea of rescaling the whole seems to work amazingly. The rescale factor I wrote previously might also help, but that’s less obvious :sweat_smile:.

I’ll try to do more robust experiments with all this, and see if we can prove more robustly that those rescaling factors help.

1. regular guidance (7.5) followed by rescale to match the original t
1.a Reminder: original images
pred = u + g*(t-u)

1.b With the “whole” rescaling
pred_nonscaled= u + g*(t-u)
pred = pred * torch.norm(u)/torch.norm(pred_nonscaled)

→ seems to add a lot of details, without changing the picture!!

2. rescaled guidance update (0.15) followed by rescale to match the original t
2.a Reminder: original images
pred_nonscaled= u + g*(t-u)/torch.norm(t-u)*torch.norm(u)

(note the rider’s foot missing on the right picture)

2.b With the “whole” rescaling
pred_nonscaled= u + g*(t-u)/torch.norm(t-u)*torch.norm(u)
pred = pred_nonscaled * torch.norm(u)/torch.norm(pred_nonscaled)

The whole term rescaling definitely seems to help here too! Notably, it fixed the foot artifact observed previously

8 Likes

This looks great.
Seems like increasing 3D normals intensity :slight_smile:

1 Like

I observed something similar with the increase in detail/texture when i replaced the constant guidance_scale (orange) with a cosine scheduler(blue)(guidance_scale values decreases as the number of inference steps increases):

The image on the left is generated with the cosine guidance scale and the one on the right is with a constant guidance scale.
(upload://Akk7o65vKXGKdcGJcJ3SYV0Xxr7.png)

14 Likes

This is great!

2 Likes

meme saying '0 days since a community member has a cool insight that is obvious in hindsight

Great idea @barnacl!

13 Likes

It’s amazing how things can be obvious in hindsight, yet literally no researcher has ever thought to try it before!

10 Likes

Actually, even linear decay seems to work. But initial guidance value needs to be higher. At guidance value 10 and linear decay, the horse missing leg issue also gets resolved as observed by @sebderhy :slight_smile:

Some experiment results:

  1. Guidance value: 10 , linear decay, 50 steps

  2. Guidance value: 10, linear decay, 30 steps

  3. Guidance value: 10, linear decay, 20 steps

  4. Guidance value: 7.5, linear decay, 50 steps. The generation quality seems bad with lower guidance for a decay schedule.

  5. Guidance value: 7.5, guidance remains same for all time steps. Here horse is missing a leg.

5 Likes

I did see more texture in the linear case too, but instead of increasing the guidance_scale i used g at 7.5 for 40 inference steps and then reduced it linearly for the next 20 steps.

1 Like

I faced the same problem when I tried linear interpolation (lerp) b/w two noise vectors. interestingly, spherical linear interpolation (slerp) doesn’t have the degeneration problem you encountered.

mathematically, intermediate vectors that come from lerp are shorter in length than those from slerp which might be the cause of degeneration.

3 Likes

Hey, This looks great. A naive question, how did you implement the cosine scheduler in pytorch. Did you use the same max and min values as in the original scheduler and replaced the steps by using cosine scheduler?

Awesome work! what was the prompt?

Fastai docs :slight_smile:
It’s pretty much the exact same code.
Max is set to 7.5 and min is 0.1

def get_lr_per_epoch(scheduler, num_epoch):
    lr_per_epoch = []
    for epoch in range(num_epoch):
        lr_per_epoch.append(scheduler.get_epoch_values(epoch))
    return lr_per_epoch
num_epoch = 60
sched = CosineLRScheduler(optimizer, t_initial=num_epoch,lr_min=0.1)
lr_per_epoch = get_lr_per_epoch(sched, num_epoch)
4 Likes

Thank you, the prompt was inspired by my dog - “A portrait of a yorkshire terrier with a top hat” :slight_smile:

2 Likes

It makes me so happy when people read our docs :smiley:

5 Likes

I see. Thanks. So just to get it clear, you are getting the guidance scale per step using this method and you are multiplying this with the scheduler timestep value for that step, in simple terms?