Lesson 10 official topic

This is a wiki post - feel free to edit to add links from the lesson or other useful info.

<<< Lesson 9Lesson 11 >>>

Lesson resources

Links from the lesson


Can we get a copy of your OneNote at the end please?

1 Like

Hi Jeremy! I recently tried running the stable_diffusion.ipynb notebook locally (after a few days of struggling with CUDA setup). Running inference on the stable diffusion model gives a CUDA out of memory error. If anyone has any insights on how to solve it (can’t reduce batch size here, can I?) would be super helpful! Thanks.

1 Like

What’s your GPU? Are you using float16?

1 Like

I reduced the image resolution to run in paperspace.


Hi Tanishq, I have a NVIDIA GeForce GTX 1650Ti, and yes I am using float16 too.

I should also mention that the same code (a image generation for “astronaut on a horse”) ran easily on a Paperspace free GPU instance, but didn’t work locally.

@jeremy has uploaded it here - Lesson 9 official topic - #188

1 Like

That card has 4 GB of VRAM IIRC. It’s a bit too little for Stable Diffusion, but perhaps you’d get lucky if you use pipe.enable_attention_slicing() after you create the pipeline. Could you try that out?


I could run the Deep Dive notebook locally on 1080 8GB by breaking up some cells here. See if this helps you to run on 1650.

EDIT: 4GB is very low. When I checked nvidia-smi, its usually 5GB used at least, so it might not help


That’s right, it has 4GB, which I suspected was too low! Thanks for the advice, I’ll try that out. The thought process was, “Can I get the SD model to run inference locally, no matter how slowly?”. Just looking for ways to do that, just as we play with batch sizes and/or gradient accumulation in image classification models.

1 Like

Latent noise looks different from a TV-grained type of noise. Is that what you meant when you showed us the Jeremy Howard image please? Thanks.

Ah, that’s interesting! Thank you for linking your version of it, I’ll check it out after the lecture :slight_smile:

4GB is very low. When I checked nvidia-smi, its usually 5GB used at least, so it might not help

Probably not the right place, but cannot think of a better thread right now (will move the discussion if needed). Random thought: it should be in theory possible to apply the VAE + Unet on the latents on things like image segmentation as well to get maybe some faster results. Wondering what the group thinks about it

Oh, okay. Paperspace/Lambda it is, then! Thanks for your answers.


Has anyone tried running the notebooks with NVIDIA GeForce RTX 3070 Mobile / Max-Q 8GB RAM?

From the ‘Progressive Distillation’ paper showing quality (lower is better) for different numbers of steps comparing their distilled version with a non-distilled model sampled with DDIM. You can see the original models need more steps to get to decent quality.


Links to both the papers discussed: