Transfer learning for seq2seq + abstractive text summarization


Hey everyone. I’m curious if anyone has read anything about or attempted applying transfer learning techniques to sequence to sequence models. In particular, I’m wondering how well using pretrained layers (e.g. as in ulmfit) for the encoder layers at least would work, while “fine-tuning” the decoder layers. Does this idea sound stupid/crazy at all to you?

Note in my case I’m looking at applying this to text summarization (in english). On a related note, if anyone has any interesting experience or advice on doing (abstractive) text summarization with deep learning I’d be happy to hear what you’ve got.

Particularly if you’ve done anything that deals with summarizing whole corpuses, or generating long summaries (like more than a few sentences). The closest thing I’ve found are the pointer generator networks from Stanford (See, Liu, Manning), but they only work with summaries that are 3-4 sentences, and only summarize individual documents.

(Jeremy) #2

I don’t really have any suggestions one way or the other, as I am also new to the field and currently learning what works and doesn’t. I can say that I did find that building an intent classifier to identify the correct intent from 20 choices with a small training set (< 100 examples each) did not prove fruitful, even when using word embeddings. I was not able to get better than ~80% validation accuracy, and there was no improvement / major difference between random initial embeddings and Glove. I was not yet aware of the language embeddings approach behind ULMFit, and am currently learning more about that. We can keep in-touch re: getting things working, as I have a lot of NLP / NLU research to do in the near future and will be trying a variety of deep learning techniques, including perhaps text summarization / topic modeling sorts of approaches. Sorry this wasn’t more helpful :frowning:


I’m curious, have you tried doing a seq2seq model with a pre-trained encoder (like the ULMfit model)? I remember in part 2’s neural translation lecture that he said it ought to be possible to bring the ULMfit transfer learning approach to seq2seq models, but I haven’t seen anything about it on the forums.


(helen) #4

hey rkingery. I googled about transfer learning for text summarization and your post came up. There isn’t much on the internet apart from this paper. I also recently read about ULMFit and was looking for any paper or code that has used this idea in text summarization.

How have you tried with ULMFit? Does it improve much on the results?

(Nick) #5

Hi all,

I am really interested in this topic as well. For my master thesis research I am looking into applying transfer learning in text summarization on a legal document dataset. This data set is quite small, so my plan is to train an abstractive summarizer on a larger (e.g. CNN) dataset and then fine-tune it on the legal data set.

Please share experiences and lets connect in the future.