Resources for really understanding how to apply attention?

wgpubs · July 12, 2018, 12:56am

Looking for recommended resources for really understanding attention, the different kinds of attention one can use, and how to interpret the attentional weights (esp. when they don’t line up as expected).

I understand how attention works conceptually, but having gone through the translate.ipynb notebook several times, I’m still feeling like I’m not completely sure as to why it is implemented like it is in that notebook.

Even · July 13, 2018, 3:43am

I’m studying it myself right now. One of the best resources I’ve found so far is:
http://nlp.seas.harvard.edu/2018/04/03/attention.html

It’s a great mix of a walkthrough of the attention is all you need paper, and the corresponding code that implements it, which I found really helpful. I actually meant to post it to the fora a while ago but haven’t had the chance. That’s in the context of language modelling, but there are other examples and resources for image based attention like Show, Attend, and Tell.

https://youtu.be/ByjaPdWXKJ4?t=2287 has a good explanation of attention that I found helpful, although I’m not sure how up to date it is.

https://arxiv.org/pdf/1807.03756v1.pdf is really interesting, just came out, and is open source so you can look at the code: https://github.com/harvardnlp/var-attn/ which I need to really understand what’s going on but I’m not quite there in terms of my understanding so I haven’t dug in there yet.

Lastly I found: http://www.wildml.com/2016/01/attention-and-memory-in-deep-learning-and-nlp/ to be interesting.

I’m curious what other resources people have for attention. It’s the most interesting topic in deep learning for me, but I feel like I’m struggling to get below the surface level implementation in terms of understanding it.

wgpubs · July 13, 2018, 4:32am

Thanks for the links (I feel like we are kinda in the same boat).

For me, the following are proving helpful:

github.com

spro/practical-pytorch/blob/master/seq2seq-translation/seq2seq-translation.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![](https://i.imgur.com/eBRPvWB.png)\n",
    "\n",
    "# Practical PyTorch: Translation with a Sequence to Sequence Network and Attention\n",
    "\n",
    "In this project we will be teaching a neural network to translate from French to English.\n",
    "\n",
    "```\n",
    "[KEY: > input, = target, < output]\n",
    "\n",
    "> il est en train de peindre un tableau .\n",
    "= he is painting a picture .\n",
    "< he is painting a picture .\n",
    "\n",
    "> pourquoi ne pas essayer ce vin delicieux ?\n",

This file has been truncated. show original

and

In addition to understanding the various attention implementations, I’m also confused by interpreting the attentional weights. For example, the attentional weights in the translate.ipynb notebook seem to be off by 1 and I can’t for the life of me figure out why. It’s probably something simple that I’m missing, but what that is I don’t know.