Hi Jeremy! I recently tried running the stable_diffusion.ipynb notebook locally (after a few days of struggling with CUDA setup). Running inference on the stable diffusion model gives a CUDA out of memory error. If anyone has any insights on how to solve it (can’t reduce batch size here, can I?) would be super helpful! Thanks.
I should also mention that the same code (a image generation for “astronaut on a horse”) ran easily on a Paperspace free GPU instance, but didn’t work locally.
That card has 4 GB of VRAM IIRC. It’s a bit too little for Stable Diffusion, but perhaps you’d get lucky if you use pipe.enable_attention_slicing() after you create the pipeline. Could you try that out?
That’s right, it has 4GB, which I suspected was too low! Thanks for the advice, I’ll try that out. The thought process was, “Can I get the SD model to run inference locally, no matter how slowly?”. Just looking for ways to do that, just as we play with batch sizes and/or gradient accumulation in image classification models.
Probably not the right place, but cannot think of a better thread right now (will move the discussion if needed). Random thought: it should be in theory possible to apply the VAE + Unet on the latents on things like image segmentation as well to get maybe some faster results. Wondering what the group thinks about it
From the ‘Progressive Distillation’ paper showing quality (lower is better) for different numbers of steps comparing their distilled version with a non-distilled model sampled with DDIM. You can see the original models need more steps to get to decent quality.