Exploring Style Transfer for Text

Introduction and Problem Description

In lectures 8 and 9, we discussed the problem of transferring the style from one image to another. The problem of style transfer in the context of language is natural follow-up question.

Let us define the goals before we think more about the problem. What does it even mean to transfer the style from one body of text to another?

I think there are primarily 3 ways in which one could transform the style of a given piece of text. Let’s fix on the notations before listing these.

Let T_in be the given input text, and C be the corpus from which the style has to be transferred to create T_C, the output text with the style derived from C.

1. Lexical transfer

  • Substitute some words in T_in with synonyms from C. The synonyms in this case don’t have to be words that have a 1 to 1 mapping with the meaning, but can be something that just fits in the context. It is tempting to conclude that one can learn a mapping of synonyms from C to T_in. However, that boils down to a simpler version of the problem of translation.

2. Grammar transfer

  • Rearrange the words in T_in to be compliant with the grammatical structure of C. For example, if C was a set of sentences generated by Yoda, we expect the output to follow an object-subject-verb order. Making sense, am I?

3. Semantics transfer

  • Completely change the text, but preserve the meaning. I think this reduces to the problem of translation.

A First Attempt at Lexical Transfer

I have taken a stab at the textual style transfer of the 1st kind as described above. Recall that T_in is the input corpus, and C is the style corpus from which the style has to be transferred to T_in.

The high-level idea is as follows:

  1. We train a generative model, G on the given corpus C. G is thus capable of generating sequences of text that (hopefully) appear to be sampled from C.

  2. Preprocess T_in and delete some of the words. Let the processed input be T_in’.

  3. Feed T_in’ to G from left to right. G will now start “filling the blanks”, with the words still present in T_in guiding the generated words. To borrow terminology from lectures 8 and 9, the generator G will be responsible for generating the style, and the non-deleted words will provide the content.


As the generator, G, I used a single LSTM, adopted from one of the Keras examples (with some changes to the model parameters). Two different corpora were used for training G.

Input text (T_in)

“IF WINTER comes,” the poet Shelley asked, “can Spring be far behind?” For the best part of a decade the answer as far as the world economy has been concerned has been an increasingly weary “Yes it can”. Now, though, after testing the faith of the most patient souls with glimmers that came to nothing, things seem to be warming up. It looks likely that this year, for the first time since 2010, rich-world and developing economies will put on synchronised growth spurts.

(Taken from this article)


_The words that have been removed are replaced with a .

“IF WINTER comes”, the poet Shelley asked, “can Spring be _ behind?” For the best part of a decade the _ as far as the world economy has been concerned has been an _ weary “Yes it can”. Now, though, after testing the faith of the most _ souls with _ that came to nothing, things seem to be warming up. It looks _ that this year, for the _ time since 2010, rich-world and _ economies will put on synchronised growth spurts."

C = Nietzsche

(G trained using works of Nietzsche)

C = Shakespeare

(G trained using works of Shakespeare)



I was expecting better results. Some of the top items in my todo list include:

  1. Tweaking the model

  2. Being smarter about the words to be removed from T_in. Perhaps only those words should be removed that have a context that’s likely to be present in C?

  3. Can the GAN framework help here?

  4. Explore transfer of the second kind (using parts of speech, perhaps).

Please share any feedback on the approach or pointers to any related work.

If you are attending the course in person, are interested in an NLP project, and would be interested in extending this work, please feel free to ping me; we can work on it together.

Thanks for reading.


Perhaps a narrower problem to solve would be: active vs passive voice.

I know journalism is typically written in the passive voice.

The main underlying goal is to “style transfer” / automatically convert an informative newspaper article into a more persuasive essay.


I think for lexical transfer, you may not need anything more than Word2Vec itself.

If instead of training a Word2Vec model only on English you trained it on English and Spanish, it will not only learn relationships between man and woman in English, but would also learn that relationship in Spanish and analogies across languages will work too (I.e. Rudimentary translation word to word).

You could also use a different Embedding for each language and try to learn a transformation from one to the other (should be possible to approximate with a single linear transformation + bias for words that have similar relationships in both languages).

A single LSTM is not going to work well, you’ll need more layers and attention (–> seq2seq) to start getting reasonable results (think Google used 4xLSTM(1024) for their example). For word to word mappings though, good Embeddings will get you most of the way there.

I’m betting we’ll learn a lot more about this stuff in the upcoming classes.

1 Like

Very interesting topic! Interested to see where it ends up…

@janardhanp22 you might be interested in this.

Thanks @sravya8 for pointing me out this thread. @amanmadaan I am interested and working on a similar use-case where my problem statement is something as below:

  1. Content: Corpus of text (basically an article/documentation)
  2. Style : Style sheet with following basic rules:
    a. Margin width 5
    b. Border = black
    c. Stop-words count <= 25
  3. Output : Generated Text Document having content plus the style.

Slightly related, I’m curious if we can build a joke generator using RNN/LSTM/GAN/RL or some combination.

I found some decent datasets, including these on Kaggle.

And there appears to be some new progress of late in text generation.


Fun project - i plan to explore some variation of it at some point but in the meantime here are 2 papers (one published) you might like:



@anamariapopescug It’s extremely relevant, thanks. I note that it’s a research proposal, will be interesting to see the progress.

EDIT: The second paper actually gives interesting results and solution approaches, thanks!

@brendan Interesting. This is also an open item at open AI’s request for research :slight_smile:

@davecg I think this will require a parallel corpora to train (same piece of text written in the corpus from which T_in is derived as well as in C).

1 Like

@janardhanp22 Isn’t this a more deterministic task?

@amanmadaan What do you mean by deterministic task ? Does it mean like rule based system?

yes, the 2nd paper looks like a student paper, but they’re further along. the 1st paper i mostly sent for just general pot. approach and also references … someone has a whole research agenda sketched out there :slight_smile:

For seq2seq definitely. If you’re just trying to map words to words though it shouldn’t - as long as the contexts of the words are similar in each corpus.

Man - woman + queen should equal king in both corpora, etc etc.

Think what you would need to solve for would be an N dimensional affine transform from space 1 to space 2 that preserves those types of relationships.

Could definitely do that with a set of analogies in both languages but I’d bet a clever loss function could do it too without that.

Training Char-RNNs for Transferring Name Styles: http://madaan.github.io/names

I’m doing online part2 (2018) of course and came up with similar idea.
Do you have any additional progress, @amanmadaan ?
My used case was to convert one text into another structure text. Like

but not necessary change the text, only restructure it to fit into paragraphs structure.
If you think it is possible, how training set do you think should look like?

It’s been a while (> 3 years!) since I created this post, but I finally did something on related lines with an amazing team. Please find more details at: https://www.ml.cmu.edu/news/news-archive/2020/june/politeness-transfer-a-tag-and-generate-approach.html, paper: https://arxiv.org/abs/2004.14257 @anamariapopescug @Kasianenko @jeremy


Huge congrats @amanmadaan! What were some of your key learnings and discoveries over the last 3 years? Can you tell us something about the process?


I got a chance to ask Jeremy’s question (along with a lot of my stupid questions) to Aman and interview him about his journey :tea::

Audio, Video