Unexpected attention weights (from translate.ipynb)

(WG) #1

I’m looking at the attention given to the English word “components” below and while I’m expecting that most of the attention would be placed on the French word “composantes”, according to the attn variable it looks like the attention is mostly on the word “des”.

My guess is that tokens are shifted by 1 in the attn variable for some reason, but I don’t know. Any clarification on how to match the attentional weights with the right tokens would be appreciated.

Blog post for Seq2Seq translation model with attention