Lesson 10 official topic

barnacl · October 25, 2022, 4:47am

I observed something similar with the increase in detail/texture when i replaced the constant guidance_scale (orange) with a cosine scheduler(blue)(guidance_scale values decreases as the number of inference steps increases):

The image on the left is generated with the cosine guidance scale and the one on the right is with a constant guidance scale.
(upload://Akk7o65vKXGKdcGJcJ3SYV0Xxr7.png)

jeremy · October 25, 2022, 4:50am

This is great!

johnowhitaker · October 25, 2022, 5:03am

meme saying '0 days since a community member has a cool insight that is obvious in hindsight

Great idea @barnacl!

jeremy · October 25, 2022, 5:08am

It’s amazing how things can be obvious in hindsight, yet literally no researcher has ever thought to try it before!

namrata · October 25, 2022, 5:26am

Actually, even linear decay seems to work. But initial guidance value needs to be higher. At guidance value 10 and linear decay, the horse missing leg issue also gets resolved as observed by @sebderhy

Some experiment results:

Guidance value: 10 , linear decay, 50 steps

Screenshot 2022-10-25 at 10.47.45 AM1276×1260 109 KB
Guidance value: 10, linear decay, 30 steps

Screenshot 2022-10-25 at 10.47.58 AM1282×1246 116 KB
Guidance value: 10, linear decay, 20 steps

Screenshot 2022-10-25 at 10.48.11 AM1302×1234 113 KB
Guidance value: 7.5, linear decay, 50 steps. The generation quality seems bad with lower guidance for a decay schedule.

Screenshot 2022-10-25 at 10.49.10 AM1386×1242 110 KB
Guidance value: 7.5, guidance remains same for all time steps. Here horse is missing a leg.

Screenshot 2022-10-25 at 10.49.53 AM1262×1242 110 KB

barnacl · October 25, 2022, 5:35am

I did see more texture in the linear case too, but instead of increasing the guidance_scale i used g at 7.5 for 40 inference steps and then reduced it linearly for the next 20 steps.

puru · October 25, 2022, 6:03am

I faced the same problem when I tried linear interpolation (lerp) b/w two noise vectors. interestingly, spherical linear interpolation (slerp) doesn’t have the degeneration problem you encountered.

mathematically, intermediate vectors that come from lerp are shorter in length than those from slerp which might be the cause of degeneration.

averma · October 25, 2022, 7:56am

Hey, This looks great. A naive question, how did you implement the cosine scheduler in pytorch. Did you use the same max and min values as in the original scheduler and replaced the steps by using cosine scheduler?

ilovescience · October 25, 2022, 8:44am

Awesome work! what was the prompt?

barnacl · October 25, 2022, 9:07am

Fastai docs
It’s pretty much the exact same code.
Max is set to 7.5 and min is 0.1

def get_lr_per_epoch(scheduler, num_epoch):
    lr_per_epoch = []
    for epoch in range(num_epoch):
        lr_per_epoch.append(scheduler.get_epoch_values(epoch))
    return lr_per_epoch
num_epoch = 60
sched = CosineLRScheduler(optimizer, t_initial=num_epoch,lr_min=0.1)
lr_per_epoch = get_lr_per_epoch(sched, num_epoch)

barnacl · October 25, 2022, 9:08am

Thank you, the prompt was inspired by my dog - “A portrait of a yorkshire terrier with a top hat”

jeremy · October 25, 2022, 10:23am

It makes me so happy when people read our docs

averma · October 25, 2022, 10:41am

I see. Thanks. So just to get it clear, you are getting the guidance scale per step using this method and you are multiplying this with the scheduler timestep value for that step, in simple terms?

barnacl · October 25, 2022, 3:49pm

Using that to get a guidance_scale value for each of the 60 num_inference_steps and then using it instead of the constant guidance_scale using it for each step. Will share the colab notebook in a bit

Pomo · October 26, 2022, 1:40am

Can someone tell me about how long it takes to run a sample prompt on the lower tiers of Colab and Paperspace?

With my Nvidia 1070 it’s 26s using float16. I think this GPU lacks faster fp16 support, but half precision is needed to fit into 8 GB memory.

And are any pretrained teacher/student distillation models available yet?

Thanks!

jamesrequa · October 26, 2022, 5:15am

love this! I have seen often people will apply a separate superresolution model to upscale the outputs of stable diffusion and this almost looks like you get super resolution output for free haha

akash5474 · October 27, 2022, 2:17am

really cool @sebderhy and @barnacl

I was able to combine the two techniques and get good results with that as well!

cos_schedule = 0.5 * (1 + torch.cos(torch.linspace(0, math.pi, num_inference_steps)))
cos_schedule = guidance_scale * cos_schedule
...
noise_pred_nonscaled = u + cos_schedule[i] * (t - u)/torch.norm(t - u) * torch.norm(u)
noise_pred = noise_pred_nonscaled * torch.norm(u)/torch.norm(noise_pred_nonscaled)

I haven’t had a chance yet but it would be interesting to compare the different techniques side by side. Maybe someone else can try it!

Fahim · October 27, 2022, 2:46am

I actually did this (I’ll put the notebook up in a bit and link to it) but couldn’t see much of a difference … might have been the particular image or just my old eyes but I wasn’t seeing the increase in detail that others were seeing …

Maybe somebody else can take a look at my notebook and tell me where I’m going wrong

sebderhy · October 27, 2022, 6:00am

Very cool!

I think we need to build a proper benchmark for Stable Diffusion, we can’t make conclusions over 2 images. I will try to start a thread on this soon.

BTW, I feel like this weighted average step is kind of a patch to compensate for a weakness of SD: why can’t we do inference only on the conditional version and get good results? Why should we ever have to hide information to the network? I think that sooner or later, someone will improve SD to get good results without doing 2 passes to the network at each step.

Fahim · October 27, 2022, 7:52am

I’ve added a notebook with all the different guidance variations discussed in this thread so far with code for each as well as the final output compared. The notebook also contains links to each post in this thread which provided a new method.

If I’ve missed any, please let me know and I’ll update

Here’s a sample of the final output … sorry about the duplicated output in the second image — in a rush.

Edit: @akash5474 Here’s the notebook I mentioned earlier. And I was wrong about there not being differences, I think I had the code wrong or something in my original work. Re-did the notebook again today and I do see the differences in the various options. So just retracting what I said earlier …

I couldn’t get your code to work the way you posted but it worked with a minor modification — please let me know if I missed something.