This is a wiki post - feel free to edit to add links from the lesson or other useful info.
Thanks for a great lesson, there is a lot to digest in this one, I think it might take me a while but great to see it all coming together
There is one downside to using
torch.einsum. They’ve historically had performance issues compared to using the transpose method.
Out of the box using
einops with PyTorch 2.0 and
torch.compile will also decrease performance, since
torch._dynamo doesn’t recognize
einops code as traceable. This will cause a break in the model’s compiled graph kicking computation for
rearrange back to the slower eager mode. This may be fixed in future PyTorch versions.
There is a workaround using
torch._dynamo.allow_in_graph which I have reproduced from chilli in the EleutherAI discord:
from einops import rearrange from torch._dynamo import allow_in_graph allow_in_graph(rearrange)
This will tell dynamo that
rearrange is traceable, allowing PyTorch to compile
rearrange into the model graph. I believe this will work for most, if not all,
allow_in_graph is not required for
torch.einsum since dynamo is aware of it.
I noticed that we have an imports.py in the miniai folder. Is this manually created or was it autogenerated by nbdev. I am not so familiar with nbdev and in my own version of the repro have been avoiding manually changing things in this folder.
I manually created
FYI, there was an error in this video (h/t @ste for spotting) where I accidentally drew the heads in multi-headed attention on the wrong axis of the matrix. I’ve uploaded a new video where I’ve added some text to explain this now.
@jeremy btw the video embedded in Practical Deep Learning for Coders - 24: Attention & transformers doesn’t work. My guess is that the embedded link is wrong, currently it is:
it should be
einops 0.6.1 added
torch.compile support. If you use an einops layer (
Rearrange) it will work out of the box. If you use an einops function (
rearrange) as Jeremy does in the lesson, then you still need to register the function to prevent it from breaking the graph. However, einops now has a function to do this automatically:
from einops._torch_specific import allow_ops_in_compiled_graph allow_ops_in_compiled_graph()