I’m studying it myself right now. One of the best resources I’ve found so far is:
It’s a great mix of a walkthrough of the attention is all you need paper, and the corresponding code that implements it, which I found really helpful. I actually meant to post it to the fora a while ago but haven’t had the chance. That’s in the context of language modelling, but there are other examples and resources for image based attention like Show, Attend, and Tell.
https://youtu.be/ByjaPdWXKJ4?t=2287 has a good explanation of attention that I found helpful, although I’m not sure how up to date it is.
https://arxiv.org/pdf/1807.03756v1.pdf is really interesting, just came out, and is open source so you can look at the code: https://github.com/harvardnlp/var-attn/ which I need to really understand what’s going on but I’m not quite there in terms of my understanding so I haven’t dug in there yet.
Lastly I found: http://www.wildml.com/2016/01/attention-and-memory-in-deep-learning-and-nlp/ to be interesting.
I’m curious what other resources people have for attention. It’s the most interesting topic in deep learning for me, but I feel like I’m struggling to get below the surface level implementation in terms of understanding it.