This is a wiki post - feel free to edit to add links from the lesson or other useful info.
Thanks for a great lesson, there is a lot to digest in this one, I think it might take me a while but great to see it all coming together
There is one downside to using einops
and torch.einsum
. They’ve historically had performance issues compared to using the transpose method.
Out of the box using einops
with PyTorch 2.0 and torch.compile
will also decrease performance, since torch._dynamo
doesn’t recognize einops
code as traceable. This will cause a break in the model’s compiled graph kicking computation for rearrange
back to the slower eager mode. This may be fixed in future PyTorch versions.
There is a workaround using torch._dynamo.allow_in_graph
which I have reproduced from chilli in the EleutherAI discord:
from einops import rearrange
from torch._dynamo import allow_in_graph
allow_in_graph(rearrange)
This will tell dynamo that rearrange
is traceable, allowing PyTorch to compile rearrange
into the model graph. I believe this will work for most, if not all, einops
methods.
allow_in_graph
is not required for torch.einsum
since dynamo is aware of it.
I noticed that we have an imports.py in the miniai folder. Is this manually created or was it autogenerated by nbdev. I am not so familiar with nbdev and in my own version of the repro have been avoiding manually changing things in this folder.
I manually created imports.py
FYI, there was an error in this video (h/t @ste for spotting) where I accidentally drew the heads in multi-headed attention on the wrong axis of the matrix. I’ve uploaded a new video where I’ve added some text to explain this now.
@jeremy btw the video embedded in Practical Deep Learning for Coders - 24: Attention & transformers doesn’t work. My guess is that the embedded link is wrong, currently it is:
src="https://www.youtube-nocookie.com/embed/https://www.youtube.com/watch?v=DH5bp6zTPB4?modestbranding=1"
it should be
src="https://www.youtube-nocookie.com/embed/DH5bp6zTPB4?modestbranding=1"
Update on rearrange
and torch.compile
:
einops 0.6.1 added torch.compile
support. If you use an einops layer (Rearrange
) it will work out of the box. If you use an einops function (rearrange
) as Jeremy does in the lesson, then you still need to register the function to prevent it from breaking the graph. However, einops now has a function to do this automatically:
from einops._torch_specific import allow_ops_in_compiled_graph
allow_ops_in_compiled_graph()
For anyone that got this error: ImportError: cannot import name ‘AttentionBlock’ from ‘diffusers.models.attention’
Apparently AttentionBlock has been replaced by just Attention source
try this: from diffusers.models.attention import Attention as AttentionBlock
But the results won’t be the same because they seem to have changed how the class works.
Thanks for another great lesson!
I’m a little confused by the timestep embeddings. What motivates them? What Is the advantage of supplying them to the model instead of , for example, simply supplying the alphabar/t/some other scalar which indicates where on the noising schedule we are?
My best guess is that we want to pass as rich a representation of the degree of noise as possible to give the MLP that takes in the timestep embeddings the opportunity to learn something useful. I’m guessing the approach we implemented in this notebook of exponentiating/scaling and then applying sine/cosine is fairly arbitrary and we could come up with a bunch of similar approaches.
Thanks!