I’m trying to build up a portfolio of deep learning projects and turn them into blog posts. I recently built a Seq2Seq translation model with attention and made a pretty detailed blog post walking through the conceptual details. There is also a jupyter notebook that walks through building and training the model.
This was the first in depth post that I’ve wrote so I would really appreciate some feedback. I spent some time making some cool illustrations for it so its worth checking out just for that.
The post was really well written and the notebook was also very helpful. I definitely would be using them as reference to create my own post. Thank you.
Thanks! I’m also a bit confused by the attention weights. My first guess is that early in the translation the decoder gets more benefit looking further down the sentence. The decoder also has access to the hidden state that was passed by the encoder to make its decision, so early in the translation this might be a more dominant factor in the translation.
Potential ideas I had would be to check if the attention weights were consistently off by one index or do they get back on track later in the sequence. Also, you could try to initialize the decoder with a blank hidden state to train it to utilize the information coming from the attention model without the hidden state from the encoder.
Hi, I’ve been looking for something that would explain a concept of Attention in transformers, and I found your post on our forum.
Your code so clearly did it. I mean, showed to me explicitly how Attention works.
Thank you!!!