Better Summarization

An interesting addition to standard seq2seq + attention models for summarization.

Main additions are a probability of copying a word from the original (allows out of vocabulary words) and a coverage penalty (minimize overlap of current attention vector with sum of previous attention vectors) to avoid repetition.

Looks very promising and has a nice write up ( in addition to the paper:

