Transfer learning for seq2seq + abstractive text summarization

Hey everyone. I’m curious if anyone has read anything about or attempted applying transfer learning techniques to sequence to sequence models. In particular, I’m wondering how well using pretrained layers (e.g. as in ulmfit) for the encoder layers at least would work, while “fine-tuning” the decoder layers. Does this idea sound stupid/crazy at all to you?

Note in my case I’m looking at applying this to text summarization (in english). On a related note, if anyone has any interesting experience or advice on doing (abstractive) text summarization with deep learning I’d be happy to hear what you’ve got.

Particularly if you’ve done anything that deals with summarizing whole corpuses, or generating long summaries (like more than a few sentences). The closest thing I’ve found are the pointer generator networks from Stanford (See, Liu, Manning), but they only work with summaries that are 3-4 sentences, and only summarize individual documents.


I don’t really have any suggestions one way or the other, as I am also new to the field and currently learning what works and doesn’t. I can say that I did find that building an intent classifier to identify the correct intent from 20 choices with a small training set (< 100 examples each) did not prove fruitful, even when using word embeddings. I was not able to get better than ~80% validation accuracy, and there was no improvement / major difference between random initial embeddings and Glove. I was not yet aware of the language embeddings approach behind ULMFit, and am currently learning more about that. We can keep in-touch re: getting things working, as I have a lot of NLP / NLU research to do in the near future and will be trying a variety of deep learning techniques, including perhaps text summarization / topic modeling sorts of approaches. Sorry this wasn’t more helpful :frowning:

I’m curious, have you tried doing a seq2seq model with a pre-trained encoder (like the ULMfit model)? I remember in part 2’s neural translation lecture that he said it ought to be possible to bring the ULMfit transfer learning approach to seq2seq models, but I haven’t seen anything about it on the forums.


hey rkingery. I googled about transfer learning for text summarization and your post came up. There isn’t much on the internet apart from this paper. I also recently read about ULMFit and was looking for any paper or code that has used this idea in text summarization.

How have you tried with ULMFit? Does it improve much on the results?

Hi all,

I am really interested in this topic as well. For my master thesis research I am looking into applying transfer learning in text summarization on a legal document dataset. This data set is quite small, so my plan is to train an abstractive summarizer on a larger (e.g. CNN) dataset and then fine-tune it on the legal data set.

Please share experiences and lets connect in the future.

Sorry for the delayed replay. I did not try to go with a pretrained LM + finetune approach here due to time constraints. Instead I used a pretrained pointer generator network to generate the summaries. For those that are curious, here’s how I went about generating a single large summary for a large corpus, which was my primary goal:

  1. Use unsupervised methods to cluster the documents into some coherent topics. Naively this suggests using LDA or something like that to do formal topic modeling, but this didn’t work as well in my case as just using k-means to cluster.
  2. From each cluster, select a representative document, e.g. the document closest to the cluster center.
  3. Independently summarize each of these representative documents using a current state of the art summarizer like a pointer generator network (or possibly a transfer learning approach).
  4. Aggregate those summaries together to produce one corpus summary. I did this by simple concatenation, but you can try to be smarter about it as well.

Sebastian Ruder keeps track of NLP progress in summarization here:


Can try out some of the approaches here

Very happy to find this thread!

I’m currently working on a project where I’m trying to take a number of text fields as well as some categorical tabular fields and from there produce a coherent summary. This is for a not-for-profit and today it consumes quite a bit of volunteer hours just doing this by hand.

My challenge is they only have about 1000 examples of this going well, so I’m looking to see if I can do
transfer learning by building a summarization approach with the cnndm dataset. Then apply their 1000 cases for the domain. For now I’m “hacking” the mix of tabular & text data by artificially turning the tabular fields into text. I’m sure there’s a better approach :slight_smile:

Best I can tell, as of today there aren’t libraries for seq-to-seq learners. Can anyone confirm that? I’ve made it through the first Deep Learning for Coders class but am not familiar enough with PyTorch yet to build one myself. I’ll likely continue on the second class and see if I can figure out how to build such a thing if no one else has done it!

There’s also the issue of scoring these things which is pretty ugly in the text summarization world. Would love any suggestions.

Hello I just found this papers. Hope they help!

The MASS model also support text summarization.

1 Like