Lesson 10 official topic

I observed something similar with the increase in detail/texture when i replaced the constant guidance_scale (orange) with a cosine scheduler(blue)(guidance_scale values decreases as the number of inference steps increases):

The image on the left is generated with the cosine guidance scale and the one on the right is with a constant guidance scale.
(upload://Akk7o65vKXGKdcGJcJ3SYV0Xxr7.png)

14 Likes

This is great!

2 Likes

meme saying '0 days since a community member has a cool insight that is obvious in hindsight

Great idea @barnacl!

13 Likes

It’s amazing how things can be obvious in hindsight, yet literally no researcher has ever thought to try it before!

10 Likes

Actually, even linear decay seems to work. But initial guidance value needs to be higher. At guidance value 10 and linear decay, the horse missing leg issue also gets resolved as observed by @sebderhy :slight_smile:

Some experiment results:

  1. Guidance value: 10 , linear decay, 50 steps

  2. Guidance value: 10, linear decay, 30 steps

  3. Guidance value: 10, linear decay, 20 steps

  4. Guidance value: 7.5, linear decay, 50 steps. The generation quality seems bad with lower guidance for a decay schedule.

  5. Guidance value: 7.5, guidance remains same for all time steps. Here horse is missing a leg.

5 Likes

I did see more texture in the linear case too, but instead of increasing the guidance_scale i used g at 7.5 for 40 inference steps and then reduced it linearly for the next 20 steps.

1 Like

I faced the same problem when I tried linear interpolation (lerp) b/w two noise vectors. interestingly, spherical linear interpolation (slerp) doesn’t have the degeneration problem you encountered.

mathematically, intermediate vectors that come from lerp are shorter in length than those from slerp which might be the cause of degeneration.

3 Likes

Hey, This looks great. A naive question, how did you implement the cosine scheduler in pytorch. Did you use the same max and min values as in the original scheduler and replaced the steps by using cosine scheduler?

Awesome work! what was the prompt?

Fastai docs :slight_smile:
It’s pretty much the exact same code.
Max is set to 7.5 and min is 0.1

def get_lr_per_epoch(scheduler, num_epoch):
    lr_per_epoch = []
    for epoch in range(num_epoch):
        lr_per_epoch.append(scheduler.get_epoch_values(epoch))
    return lr_per_epoch
num_epoch = 60
sched = CosineLRScheduler(optimizer, t_initial=num_epoch,lr_min=0.1)
lr_per_epoch = get_lr_per_epoch(sched, num_epoch)
4 Likes

Thank you, the prompt was inspired by my dog - “A portrait of a yorkshire terrier with a top hat” :slight_smile:

2 Likes

It makes me so happy when people read our docs :smiley:

5 Likes

I see. Thanks. So just to get it clear, you are getting the guidance scale per step using this method and you are multiplying this with the scheduler timestep value for that step, in simple terms?

Using that to get a guidance_scale value for each of the 60 num_inference_steps and then using it instead of the constant guidance_scale using it for each step. Will share the colab notebook in a bit

3 Likes

Can someone tell me about how long it takes to run a sample prompt on the lower tiers of Colab and Paperspace?

With my Nvidia 1070 it’s 26s using float16. I think this GPU lacks faster fp16 support, but half precision is needed to fit into 8 GB memory.

And are any pretrained teacher/student distillation models available yet?

Thanks!

love this! I have seen often people will apply a separate superresolution model to upscale the outputs of stable diffusion and this almost looks like you get super resolution output for free haha

1 Like

really cool @sebderhy and @barnacl

I was able to combine the two techniques and get good results with that as well!

cos_schedule = 0.5 * (1 + torch.cos(torch.linspace(0, math.pi, num_inference_steps)))
cos_schedule = guidance_scale * cos_schedule
...
noise_pred_nonscaled = u + cos_schedule[i] * (t - u)/torch.norm(t - u) * torch.norm(u)
noise_pred = noise_pred_nonscaled * torch.norm(u)/torch.norm(noise_pred_nonscaled)

I haven’t had a chance yet but it would be interesting to compare the different techniques side by side. Maybe someone else can try it!

1 Like

I actually did this (I’ll put the notebook up in a bit and link to it) but couldn’t see much of a difference … might have been the particular image or just my old eyes but I wasn’t seeing the increase in detail that others were seeing …

Maybe somebody else can take a look at my notebook and tell me where I’m going wrong :slight_smile:

Very cool!

I think we need to build a proper benchmark for Stable Diffusion, we can’t make conclusions over 2 images. I will try to start a thread on this soon.

BTW, I feel like this weighted average step is kind of a patch to compensate for a weakness of SD: why can’t we do inference only on the conditional version and get good results? Why should we ever have to hide information to the network? I think that sooner or later, someone will improve SD to get good results without doing 2 passes to the network at each step.

1 Like

I’ve added a notebook with all the different guidance variations discussed in this thread so far with code for each as well as the final output compared. The notebook also contains links to each post in this thread which provided a new method.

If I’ve missed any, please let me know and I’ll update :slight_smile:

Here’s a sample of the final output … sorry about the duplicated output in the second image — in a rush.


Edit: @akash5474 Here’s the notebook I mentioned earlier. And I was wrong about there not being differences, I think I had the code wrong or something in my original work. Re-did the notebook again today and I do see the differences in the various options. So just retracting what I said earlier …

I couldn’t get your code to work the way you posted but it worked with a minor modification — please let me know if I missed something.

7 Likes