Generating text with ULMFiT

I’m reading the source of your ULMFiT model. To understand how it works, I’m trying to use it for simple language modeling: I provide a few words and let the AWD-LSTM guess the next one.

However this works really bad and I wonder if my approach is just broken. My code is fairly small:

from fastai import *
from fastai.text import *
import torch

path = untar_data(URLs.IMDB)

# taken from the ULMFit IMDB sample
data_lm = text_data_from_csv(path, data_func=lm_data)
learn = RNNLearner.language_model(data_lm, pretrained_fnames=['lstm_wt103', 'itos_wt103'])

# mapping between vocabulary and indices
itos = data_lm.vocab.itos

#text = "It is raining again. Recently the weather has become so"
#text = "I'm reading a book. It is great, I enjoy it very"
#text = "This new dress really suits"
text = "The restaurant has recently opened . People are friendly but food is"

# convert my text to indices
# 0 corresponds to unknown
words = text.lower().split(" ")
indices = [itos.index(word) if word in itos else 0 for word in words]
words = [itos[idx] for idx in indices]

# convert the data to a batch, sequence along first dimension
batch = np.array(indices).reshape((-1,1))
batch = torch.tensor(batch).cuda()

# use AWD-LSTM to predict
preds = learn.model(batch)
# everything but output 0 is just meta-information
preds = preds[0]
log_probs = F.log_softmax(preds, 1)

# switch over to numpy
log_probs = log_probs.detach().cpu().numpy()
log_probs = log_probs[-1]
highest_probs = log_probs.argsort()[-40:][::-1]

guessed_words = [itos[idx] for idx in highest_probs]

By uncommenting the different texts, you can try different samples.
For example:
“This new dress really suits” produces
['.', ',', 'on', 'in', 'the', 'a', 'and', 'with', '"', ';', 'from', "'", 'that', 'xxunk', ':', 'to', ...

I would have expected a pronoun, such as “you”, “her”, “me”, …
The other examples are also not really convincing.

I had the impression that the model is just learning a general bias towards common words. The paper on AWD-LSTM reports very competitive perplexity values. I would have expected to be able to perform better with this. So my question is, is my approach broken ?
My first intuition was that I didn’t set up the hidden state properly. But after debug-stepping through your implementation, I don’t think that this is the case. The hidden states are initialized in reset() and then just updated for every entry in the sequence.

For fun, I used the setup to sample some more values. The model is given the following sentence:

Winter is coming after all. The weather has become horrible. All week it has been raining and today it

This is fed to the model, highest prediction is appended to the sentence, repeat. The model completes the sentence:

Winter is coming after all. The weather has become horrible. All week it has been raining and today it has become a major success.

One after the other, the tokens “has”, “become”, “a”, “major”, “success”, “.” are predicted.
The grammar is nice and it even ends the sentence. So I guess my impression that it just learns a bias towards common words is false. But there is no context awareness at all.
Is this how it should behave ?


I believe this is exactly what Jeremy found and discussed in the lecture on generative modeling found here:

You are using a greedy search which tends to produce this behavior. He discusses other methods such as beam search in the lecture. Perhaps it will be helpful.