Seq2Seq - Pretrain Encoder?

goralpl · February 1, 2021, 1:54am

Hello.

I recently completed the Practical Deep Learning For Coders (2020) course.

In lesson 10, we are taught that pre-training training a language model on a corpus of text, saving the Encoder & vocab to be used in a classification task (sentiment analysis), would yield better results than simply using a randomly initialized encoder & vocab.

I would like to apply the same logic to a seq2seq task.

I have a limited dataset where I have both the source and target sequences I would like the model to learn. I do, however, have a ton of data that just has examples of the source sequence.

I’m wondering if anyone has pretrained an encoder on the source sequence by building a language model just using the source sequence data and then fine tuned the model on the seq2seq task using the pre-trained encoder?

Any examples of this using the AWD_LSTM or vanilla LSTM would be appreciated!!