Looking for recommended resources for really understanding attention, the different kinds of attention one can use, and how to interpret the attentional weights (esp. when they don’t line up as expected).
I understand how attention works conceptually, but having gone through the translate.ipynb notebook several times, I’m still feeling like I’m not completely sure as to why it is implemented like it is in that notebook.
It’s a great mix of a walkthrough of the attention is all you need paper, and the corresponding code that implements it, which I found really helpful. I actually meant to post it to the fora a while ago but haven’t had the chance. That’s in the context of language modelling, but there are other examples and resources for image based attention like Show, Attend, and Tell.
I’m curious what other resources people have for attention. It’s the most interesting topic in deep learning for me, but I feel like I’m struggling to get below the surface level implementation in terms of understanding it.
Thanks for the links (I feel like we are kinda in the same boat).
For me, the following are proving helpful:
and
In addition to understanding the various attention implementations, I’m also confused by interpreting the attentional weights. For example, the attentional weights in the translate.ipynb notebook seem to be off by 1 and I can’t for the life of me figure out why. It’s probably something simple that I’m missing, but what that is I don’t know.