Transfer learning for seq2seq + abstractive text summarization


Hey everyone. I’m curious if anyone has read anything about or attempted applying transfer learning techniques to sequence to sequence models. In particular, I’m wondering how well using pretrained layers (e.g. as in ulmfit) for the encoder layers at least would work, while “fine-tuning” the decoder layers. Does this idea sound stupid/crazy at all to you?

Note in my case I’m looking at applying this to text summarization (in english). On a related note, if anyone has any interesting experience or advice on doing (abstractive) text summarization with deep learning I’d be happy to hear what you’ve got.

Particularly if you’ve done anything that deals with summarizing whole corpuses, or generating long summaries (like more than a few sentences). The closest thing I’ve found are the pointer generator networks from Stanford (See, Liu, Manning), but they only work with summaries that are 3-4 sentences, and only summarize individual documents.

(Jeremy) #2

I don’t really have any suggestions one way or the other, as I am also new to the field and currently learning what works and doesn’t. I can say that I did find that building an intent classifier to identify the correct intent from 20 choices with a small training set (< 100 examples each) did not prove fruitful, even when using word embeddings. I was not able to get better than ~80% validation accuracy, and there was no improvement / major difference between random initial embeddings and Glove. I was not yet aware of the language embeddings approach behind ULMFit, and am currently learning more about that. We can keep in-touch re: getting things working, as I have a lot of NLP / NLU research to do in the near future and will be trying a variety of deep learning techniques, including perhaps text summarization / topic modeling sorts of approaches. Sorry this wasn’t more helpful :frowning:


I’m curious, have you tried doing a seq2seq model with a pre-trained encoder (like the ULMfit model)? I remember in part 2’s neural translation lecture that he said it ought to be possible to bring the ULMfit transfer learning approach to seq2seq models, but I haven’t seen anything about it on the forums.