Help on Stable Diffusion Resume Friendly Workflow

I’m trying to create a resumable workflow when generating images. Basically here what I’m trying to do:

  • We are getting an image with 50 steps
  • Instead of waiting for 50 steps, I’ll abort & get latents on the 20th step
  • Then if I want, I will resume the rest of the steps

I tried to do it but somehow it won’t produce the original result when resumed. See:

  • 1st image is the image after 50 steps
  • 2nd set of images are the intermediate latents
  • 3rd is the after I resumed

Here’s the code for the generation:

def gen_image(text_embs, height=512, width=512, steps=50, gd=7.5, seed=100, get_all=False, return_preview=False, stop_at_step=-1, input_latents=None, resume_step=-100):
    torch.manual_seed(seed)
    latents = torch.randn(len(text_embs)//2, unet.in_channels, height // 8, width // 8).to("cuda").half()
    
    scheduler.set_timesteps(steps) 
    
    if input_latents == None:
        latents = latents * scheduler.init_noise_sigma
    else:
        latents = input_latents
    
    latents_list = []
    
    for i, t in enumerate(tqdm(scheduler.timesteps)):
        if i < resume_step:
            continue
                
        input = torch.cat([latents] * 2)
        input = scheduler.scale_model_input(input, t)

        # predict the noise residual
        with torch.no_grad(): pred = unet(input, t, encoder_hidden_states=text_embs).sample

        # perform guidance
        pred_uncond, pred_text = pred.chunk(2)
        pred = pred_uncond + gd * (pred_text - pred_uncond)

        # compute the "previous" noisy sample
        updated_info = scheduler.step(pred, t, latents)
        latents = updated_info.prev_sample
        
        if stop_at_step == i:
            return updated_info.pred_original_sample, latents
        
        if get_all:
            latents_list.append(updated_info.pred_original_sample if return_preview else latents)
    
    if get_all:
        return latents_list
    
    return latents

And here’s the complete notebook.

Any idea on why this is not working well?

My brain’s a little mush at the moment and so looking at your code, I can’t really see what the issue is but I believe I did exactly what you’re trying to do — resume after a certain number of steps using the latents from a previous run.

My notebook with that functionality is here — what you need is the get_noise method which is passed the latents from a previous run which was interrupted after 10 - 20 steps. Maybe it’ll help you figure out what might be going wrong with your code?

If you do, please let me know since I’d like to know :slight_smile: I’ll take a look later after my brain has had a chance to recover …

1 Like

Thanks. I’ll go through the code & check.

BTW: I’m super impressed with how you are doing diff-edit.
I didn’t think of that trick before :slight_smile:
Gonna try that as well :slight_smile:

Thank you :slight_smile: I got stuck on the mask generation though and then I’ve been sidetracked for the last couple of days doing other stuff. But I guess every bit you learn helps, right?

Somebody else had gotten the mask part done and they were kind enough to share the notebook here and so I’m thinking of trying that out … maybe tomorrow?

Let me know how you get on with this and if it helps. Hope it does!

1 Like

I got fixed my issue with some testing:

Had to change the steps as shown below.

Cool! Glad you got it sorted :slight_smile:

1 Like

It works, but not sure why we need to do such :slight_smile:

The changes you made were to change steps to 49 and resume_step to 26 right? What if you just made resume_step 26 only? Does it still work? Or do you need both changes?

Nope. It had to be steps - 1 all the times. 26 makes sense. Cause, we already completed the 25th step.

Yep, 26 makes sense. That was why I was wondering about the 49. I don’t know why it has to be 49 and not 50 … But then again, I’m not looking at the code and am simply going by values. Should probably look at the code …

But have you tried changing the stop_at_step value to see what happens? For example, if you change that to 20 do you still have to set steps to 49 for the resume run?

This would do nothing right?

Maybe I misunderstood, but I thought setting steps to 49 instead of 50 was what fixed the issue you were seeing, right? I was just curious as to why it had to be 49 and couldn’t be 50 and was wondering if that was due to the number of steps remaining. So if you change the stop_at_step to 20, then for the next run you’d have resume_at set to 21 and if now steps works with 50 (instead of 49), then you know that the number of steps has something to do with the issue, right?

Just wondering since you mentioned that you have no idea why it works at steps set to 49 and not 50 … But maybe I misunderstood?