I’m trying to build up a portfolio of deep learning projects and turn them into blog posts. I recently built a Seq2Seq translation model with attention and made a pretty detailed blog post walking through the conceptual details. There is also a jupyter notebook that walks through building and training the model.
This was the first in depth post that I’ve wrote so I would really appreciate some feedback. I spent some time making some cool illustrations for it so its worth checking out just for that.
You can see it here:
The post was really well written and the notebook was also very helpful. I definitely would be using them as reference to create my own post. Thank you.
Awesome! I’m glad you found it helpful.
I noticed the same thing regarding the attentional weights in the translate.ipynb notebook (see Unexpected attention weights (from translate.ipynb)).
Was hoping to be able to use these weights, but as it is, they don’t seem to make much sense.
Thanks! I’m also a bit confused by the attention weights. My first guess is that early in the translation the decoder gets more benefit looking further down the sentence. The decoder also has access to the hidden state that was passed by the encoder to make its decision, so early in the translation this might be a more dominant factor in the translation.
Potential ideas I had would be to check if the attention weights were consistently off by one index or do they get back on track later in the sequence. Also, you could try to initialize the decoder with a blank hidden state to train it to utilize the information coming from the attention model without the hidden state from the encoder.
Hi, I’ve been looking for something that would explain a concept of Attention in transformers, and I found your post on our forum.
Your code so clearly did it. I mean, showed to me explicitly how Attention works.
really good job