Lesson 10 official topic

suvash · October 24, 2022, 8:34pm

I was following along lesson working on another dataset and realised that it’s a bit painful to load raw image bytes to multidimensional arrays(lists) purely in Python (obviously) without using libs like PIL & numpy.

After struggling a bit, I got curious about the datatype of the pickle object that we load in the lesson. Apparently, it’s of the type numpy.ndarray. We’re not really using numpy API after loading the data, so I guess it’s fine with the ground rules we’ve set(on not using numpy apis until we’ve sorta recreated them).

Either ways, I had a hard time loading PNGs directly to multidimensional lists, so I’m going to cut myself some slack and use PIL.Image and numpy.asarray for loading up the data. Just this one time.

jeremy · October 24, 2022, 9:56pm

Yes exactly - that was my theory anyways… Whilst I think it would be interesting and instructive to write a jpg or png decoder from scratch, it does feel rather out of scope!

jeremy · October 24, 2022, 9:58pm

I’m glad you tried that. I’ve been wondering about the same thing.

What if you scale the entire u + g*(t-u) instead of just the g*(t-u) bit?

BTW I tweeted your post here:

sebderhy · October 25, 2022, 2:48am

Thanks a lot for the tweet @jeremy !

Very interesting! so I’ve tried 2 things (hope they include what you had in mind): the bottom line is that your idea of rescaling the whole seems to work amazingly. The rescale factor I wrote previously might also help, but that’s less obvious .

I’ll try to do more robust experiments with all this, and see if we can prove more robustly that those rescaling factors help.

1. regular guidance (7.5) followed by rescale to match the original t
1.a Reminder: original images
pred = u + g*(t-u)

1.b With the “whole” rescaling
pred_nonscaled= u + g*(t-u)
pred = pred * torch.norm(u)/torch.norm(pred_nonscaled)

→ seems to add a lot of details, without changing the picture!!

2. rescaled guidance update (0.15) followed by rescale to match the original t
2.a Reminder: original images
pred_nonscaled= u + g*(t-u)/torch.norm(t-u)*torch.norm(u)

(note the rider’s foot missing on the right picture)

2.b With the “whole” rescaling
pred_nonscaled= u + g*(t-u)/torch.norm(t-u)*torch.norm(u)
pred = pred_nonscaled * torch.norm(u)/torch.norm(pred_nonscaled)

The whole term rescaling definitely seems to help here too! Notably, it fixed the foot artifact observed previously

arunoda · October 25, 2022, 3:47am

This looks great.
Seems like increasing 3D normals intensity

barnacl · October 25, 2022, 4:47am

I observed something similar with the increase in detail/texture when i replaced the constant guidance_scale (orange) with a cosine scheduler(blue)(guidance_scale values decreases as the number of inference steps increases):

The image on the left is generated with the cosine guidance scale and the one on the right is with a constant guidance scale.
(upload://Akk7o65vKXGKdcGJcJ3SYV0Xxr7.png)

jeremy · October 25, 2022, 4:50am

This is great!

johnowhitaker · October 25, 2022, 5:03am

meme saying '0 days since a community member has a cool insight that is obvious in hindsight

Great idea @barnacl!

jeremy · October 25, 2022, 5:08am

It’s amazing how things can be obvious in hindsight, yet literally no researcher has ever thought to try it before!

namrata · October 25, 2022, 5:26am

Actually, even linear decay seems to work. But initial guidance value needs to be higher. At guidance value 10 and linear decay, the horse missing leg issue also gets resolved as observed by @sebderhy

Some experiment results:

Guidance value: 10 , linear decay, 50 steps

Screenshot 2022-10-25 at 10.47.45 AM1276×1260 109 KB
Guidance value: 10, linear decay, 30 steps

Screenshot 2022-10-25 at 10.47.58 AM1282×1246 116 KB
Guidance value: 10, linear decay, 20 steps

Screenshot 2022-10-25 at 10.48.11 AM1302×1234 113 KB
Guidance value: 7.5, linear decay, 50 steps. The generation quality seems bad with lower guidance for a decay schedule.

Screenshot 2022-10-25 at 10.49.10 AM1386×1242 110 KB
Guidance value: 7.5, guidance remains same for all time steps. Here horse is missing a leg.

Screenshot 2022-10-25 at 10.49.53 AM1262×1242 110 KB

barnacl · October 25, 2022, 5:35am

I did see more texture in the linear case too, but instead of increasing the guidance_scale i used g at 7.5 for 40 inference steps and then reduced it linearly for the next 20 steps.

puru · October 25, 2022, 6:03am

I faced the same problem when I tried linear interpolation (lerp) b/w two noise vectors. interestingly, spherical linear interpolation (slerp) doesn’t have the degeneration problem you encountered.

mathematically, intermediate vectors that come from lerp are shorter in length than those from slerp which might be the cause of degeneration.

averma · October 25, 2022, 7:56am

Hey, This looks great. A naive question, how did you implement the cosine scheduler in pytorch. Did you use the same max and min values as in the original scheduler and replaced the steps by using cosine scheduler?

ilovescience · October 25, 2022, 8:44am

Awesome work! what was the prompt?

barnacl · October 25, 2022, 9:07am

Fastai docs
It’s pretty much the exact same code.
Max is set to 7.5 and min is 0.1

def get_lr_per_epoch(scheduler, num_epoch):
    lr_per_epoch = []
    for epoch in range(num_epoch):
        lr_per_epoch.append(scheduler.get_epoch_values(epoch))
    return lr_per_epoch
num_epoch = 60
sched = CosineLRScheduler(optimizer, t_initial=num_epoch,lr_min=0.1)
lr_per_epoch = get_lr_per_epoch(sched, num_epoch)

barnacl · October 25, 2022, 9:08am

Thank you, the prompt was inspired by my dog - “A portrait of a yorkshire terrier with a top hat”

jeremy · October 25, 2022, 10:23am

It makes me so happy when people read our docs

averma · October 25, 2022, 10:41am

I see. Thanks. So just to get it clear, you are getting the guidance scale per step using this method and you are multiplying this with the scheduler timestep value for that step, in simple terms?

barnacl · October 25, 2022, 3:49pm

Using that to get a guidance_scale value for each of the 60 num_inference_steps and then using it instead of the constant guidance_scale using it for each step. Will share the colab notebook in a bit

Pomo · October 26, 2022, 1:40am

Can someone tell me about how long it takes to run a sample prompt on the lower tiers of Colab and Paperspace?

With my Nvidia 1070 it’s 26s using float16. I think this GPU lacks faster fp16 support, but half precision is needed to fit into 8 GB memory.

And are any pretrained teacher/student distillation models available yet?

Thanks!