Hey everyone. I’m curious if anyone has read anything about or attempted applying transfer learning techniques to sequence to sequence models. In particular, I’m wondering how well using pretrained layers (e.g. as in ulmfit) for the encoder layers at least would work, while “fine-tuning” the decoder layers. Does this idea sound stupid/crazy at all to you?
Note in my case I’m looking at applying this to text summarization (in english). On a related note, if anyone has any interesting experience or advice on doing (abstractive) text summarization with deep learning I’d be happy to hear what you’ve got.
Particularly if you’ve done anything that deals with summarizing whole corpuses, or generating long summaries (like more than a few sentences). The closest thing I’ve found are the pointer generator networks from Stanford (See, Liu, Manning), but they only work with summaries that are 3-4 sentences, and only summarize individual documents.