Lesson 9 (part 2) preview

We are making the full videos of lessons 9 and 10 of the new “From Deep Learning Foundations to Stable Diffusion” course available as a special preview of the new course! Here’s all the resources for lesson 9 (which includes 3 videos):

Lesson resources

Links from the lesson


This is so great! Thanks, Jeremy :slight_smile: I had resigned myself to waiting a few months to have access to the course. But just having these videos gives me so much to work with! There goes the rest of my day, but I’m so looking forward to watching the two videos and learning from them!


A few bits of feedback on the accompanying notebook(s) — at present they seem to be set up for CUDA only. But the code should work just as well on an Apple Silicon mac (or even on an Intel Mac, but extremely slowly) with just a simple change :slight_smile:

If you add the following line in the second cell after the imports:

device = "cuda" if torch.cuda.is_available() else "mps" if torch.has_mps else "cpu"

Then all you need to do is change any other cells which have .to("cuda") to to.(device) and the code will work on any supported GPU/CPU set up.

Also, if you already have the Hugging Face Stable Diffusion model already downloaded, you can simply set up a symlink (you can do this on any platform — macOS, Linux, Windows) to point to the “stable-diffusion-v1-4” at the location where you have your notebook. Of course, if you are on Colab, it’s easier to download the model all over again — though there’s also a solution there by using a connected Google Drive, but I won’t go into that :stuck_out_tongue:

So if you have the Hugging Face Stable Diffusion model at /Users/myuser/stable-diffusion-v1-4/, then you can simply switch to the folder where you have the Jupyter notebooks and run the following (on Linux/macOS, the Windows command is slightly different):

ln -s /Users/myuser/stable-diffusion-v1-4/ stable-diffusion-v1-4

That’ll create a folder pointing to the original location and save you several gigabytes of space being used up again :slightly_smiling_face:

Then, you’d have to change the following line (or similar ones) from the notebook, to point to your folder instead of the model from the Hugging Face hub, as follows:

pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4", revision="fp16", torch_dtype=torch.float16).to(device)

The above should be changed to:

pipe = StableDiffusionPipeline.from_pretrained("stable-diffusion-v1-4", revision="fp16", torch_dtype=torch.float16).to(device)

Notice that you are pointing to the directory (or the symlink to the directory) where the models are on your local drive.

And finally, if you are on macOS, you should also drop the float16 parts since working with float16 isn’t supported on macOS correctly at the moment. So drop the following from the above line:

, revision="fp16", torch_dtype=torch.float16

To get this, as your final line (but only on macOS):

pipe = StableDiffusionPipeline.from_pretrained("stable-diffusion-v1-4").to(device)


Just finished the first video (Lesson 9) and I wanted to leave my initial feedback/impressions …

I’ve been working with Stable Diffusion code for a good two months now (not to modify the functionality but to create GUIs/output), but only vaguely understood terms such as VAE, latents, timestep, or beta. I knew what some of them were supposed to be/do, but not how they worked or how it all tied together. I finally understand all of this so much better after the first lecture. So, a great big thank you from me :slightly_smiling_face:

Also, I’ve been struggling with performance on a macOS since PyTorch/diffusers has lately become much slower to generate images on Apple Silicon and so I wanted to find out how to make things better. The Tensorflow version of Stable Diffusion gives much better performance but is missing some features I’d like to see and I didn’t have enough knowledge to figure out how to implement those myself. So, you mentioning that you’ll be implementing the code in pure Python made me prick up my ears :slight_smile:

I’m now impatient to see how that part works out but I do realize that I might not be able to see it all till the full course is out. But again, thank you — this has been huge revelation in terms of how much knowledge I’ve gained in a single video!


Hi folks, I have started to look into Lesson 9 video. Where is the main notebook that @jeremy showed in the video? Specifically the part where he is talking about guidance scale, negative prompts, init image, textual inversion, Dreambooth etc.

I got the course repo but do not think it is there.

I believe it’s in the diffusion-nbs repo, which is also linked to up there :slight_smile: It should be in one of the links that Jeremy provided up there, if that isn’t the right one …


Hi, I was using CLIP in some of the projects and that I was getting a single embedding of size 768 per text (or image), meaning a single semantic point per text, which looks pretty reasonable. Stable Diffusion is getting a matrix of a shape (77, 768), which is one point per word. I don’t get intuition why is that and how it works. Thanks

1 Like

Is there a specific reason UNet predicts the noise, instead of predicting the image with some noise reduced? Is it because we want to have control over what percentage of the predicted noise we actually want to subtract from the image?

What is the intuition behind using a unet here? Is it “segmenting” noise from non-noise here or is there another deeper reason that the skip connections are needed?