Resources for really understanding how to apply attention?


(WG) #1

Looking for recommended resources for really understanding attention, the different kinds of attention one can use, and how to interpret the attentional weights (esp. when they don’t line up as expected).

I understand how attention works conceptually, but having gone through the translate.ipynb notebook several times, I’m still feeling like I’m not completely sure as to why it is implemented like it is in that notebook.


(Even Oldridge) #2

I’m studying it myself right now. One of the best resources I’ve found so far is:
http://nlp.seas.harvard.edu/2018/04/03/attention.html

It’s a great mix of a walkthrough of the attention is all you need paper, and the corresponding code that implements it, which I found really helpful. I actually meant to post it to the fora a while ago but haven’t had the chance. That’s in the context of language modelling, but there are other examples and resources for image based attention like Show, Attend, and Tell.

https://youtu.be/ByjaPdWXKJ4?t=2287 has a good explanation of attention that I found helpful, although I’m not sure how up to date it is.

https://arxiv.org/pdf/1807.03756v1.pdf is really interesting, just came out, and is open source so you can look at the code: https://github.com/harvardnlp/var-attn/ which I need to really understand what’s going on but I’m not quite there in terms of my understanding so I haven’t dug in there yet.

Lastly I found: http://www.wildml.com/2016/01/attention-and-memory-in-deep-learning-and-nlp/ to be interesting.

I’m curious what other resources people have for attention. It’s the most interesting topic in deep learning for me, but I feel like I’m struggling to get below the surface level implementation in terms of understanding it.


(WG) #3

Thanks for the links (I feel like we are kinda in the same boat).

For me, the following are proving helpful:

and

In addition to understanding the various attention implementations, I’m also confused by interpreting the attentional weights. For example, the attentional weights in the translate.ipynb notebook seem to be off by 1 and I can’t for the life of me figure out why. It’s probably something simple that I’m missing, but what that is I don’t know.