Lesson 24 official topic

This is a wiki post - feel free to edit to add links from the lesson or other useful info.

<<< Lesson 23 Lesson 25 >>>

Lesson resources

Links from the lession


Thanks for a great lesson, there is a lot to digest in this one, I think it might take me a while but great to see it all coming together


There is one downside to using einops and torch.einsum. They’ve historically had performance issues compared to using the transpose method.

Out of the box using einops with PyTorch 2.0 and torch.compile will also decrease performance, since torch._dynamo doesn’t recognize einops code as traceable. This will cause a break in the model’s compiled graph kicking computation for rearrange back to the slower eager mode. This may be fixed in future PyTorch versions.

There is a workaround using torch._dynamo.allow_in_graph which I have reproduced from chilli in the EleutherAI discord:

from einops import rearrange
from torch._dynamo import allow_in_graph


This will tell dynamo that rearrange is traceable, allowing PyTorch to compile rearrange into the model graph. I believe this will work for most, if not all, einops methods.

allow_in_graph is not required for torch.einsum since dynamo is aware of it.


I noticed that we have an imports.py in the miniai folder. Is this manually created or was it autogenerated by nbdev. I am not so familiar with nbdev and in my own version of the repro have been avoiding manually changing things in this folder.

I manually created imports.py

1 Like

FYI, there was an error in this video (h/t @ste for spotting) where I accidentally drew the heads in multi-headed attention on the wrong axis of the matrix. I’ve uploaded a new video where I’ve added some text to explain this now.


@jeremy btw the video embedded in Practical Deep Learning for Coders - 24: Attention & transformers doesn’t work. My guess is that the embedded link is wrong, currently it is:


it should be

1 Like

Update on rearrange and torch.compile:

einops 0.6.1 added torch.compile support. If you use an einops layer (Rearrange) it will work out of the box. If you use an einops function (rearrange) as Jeremy does in the lesson, then you still need to register the function to prevent it from breaking the graph. However, einops now has a function to do this automatically:

from einops._torch_specific import allow_ops_in_compiled_graph

For anyone that got this error: ImportError: cannot import name ‘AttentionBlock’ from ‘diffusers.models.attention’

Apparently AttentionBlock has been replaced by just Attention source

try this: from diffusers.models.attention import Attention as AttentionBlock

But the results won’t be the same because they seem to have changed how the class works.


Thanks for another great lesson!

I’m a little confused by the timestep embeddings. What motivates them? What Is the advantage of supplying them to the model instead of , for example, simply supplying the alphabar/t/some other scalar which indicates where on the noising schedule we are?

My best guess is that we want to pass as rich a representation of the degree of noise as possible to give the MLP that takes in the timestep embeddings the opportunity to learn something useful. I’m guessing the approach we implemented in this notebook of exponentiating/scaling and then applying sine/cosine is fairly arbitrary and we could come up with a bunch of similar approaches.