Blog post for Seq2Seq translation model with attention


(Mac Brennan) #1

Hi Everyone,

I’m trying to build up a portfolio of deep learning projects and turn them into blog posts. I recently built a Seq2Seq translation model with attention and made a pretty detailed blog post walking through the conceptual details. There is also a jupyter notebook that walks through building and training the model.

This was the first in depth post that I’ve wrote so I would really appreciate some feedback. I spent some time making some cool illustrations for it so its worth checking out just for that.

You can see it here:
https://macbrennan90.github.io/neural-translation-model.html

Thanks!


#2

The post was really well written and the notebook was also very helpful. I definitely would be using them as reference to create my own post. Thank you.


(Mac Brennan) #3

Awesome! I’m glad you found it helpful.


(WG) #4

Nice write-up.

I noticed the same thing regarding the attentional weights in the translate.ipynb notebook (see Unexpected attention weights (from translate.ipynb)).

Was hoping to be able to use these weights, but as it is, they don’t seem to make much sense.


(Mac Brennan) #5

Thanks! I’m also a bit confused by the attention weights. My first guess is that early in the translation the decoder gets more benefit looking further down the sentence. The decoder also has access to the hidden state that was passed by the encoder to make its decision, so early in the translation this might be a more dominant factor in the translation.

Potential ideas I had would be to check if the attention weights were consistently off by one index or do they get back on track later in the sequence. Also, you could try to initialize the decoder with a blank hidden state to train it to utilize the information coming from the attention model without the hidden state from the encoder.