I’m reading the source of your ULMFiT model. To understand how it works, I’m trying to use it for simple language modeling: I provide a few words and let the AWD-LSTM guess the next one.
However this works really bad and I wonder if my approach is just broken. My code is fairly small:
from fastai import * from fastai.text import * import torch path = untar_data(URLs.IMDB) # taken from the ULMFit IMDB sample data_lm = text_data_from_csv(path, data_func=lm_data) datasets.download_wt103_model() learn = RNNLearner.language_model(data_lm, pretrained_fnames=['lstm_wt103', 'itos_wt103']) # mapping between vocabulary and indices itos = data_lm.vocab.itos #text = "It is raining again. Recently the weather has become so" #text = "I'm reading a book. It is great, I enjoy it very" #text = "This new dress really suits" text = "The restaurant has recently opened . People are friendly but food is" # convert my text to indices # 0 corresponds to unknown words = text.lower().split(" ") indices = [itos.index(word) if word in itos else 0 for word in words] words = [itos[idx] for idx in indices] # convert the data to a batch, sequence along first dimension batch = np.array(indices).reshape((-1,1)) batch = torch.tensor(batch).cuda() # use AWD-LSTM to predict learn.model.reset() preds = learn.model(batch) # everything but output 0 is just meta-information preds = preds log_probs = F.log_softmax(preds, 1) # switch over to numpy log_probs = log_probs.detach().cpu().numpy() log_probs = log_probs[-1] highest_probs = log_probs.argsort()[-40:][::-1] guessed_words = [itos[idx] for idx in highest_probs] print(guessed_words)
By uncommenting the different texts, you can try different samples.
“This new dress really suits” produces
['.', ',', 'on', 'in', 'the', 'a', 'and', 'with', '"', ';', 'from', "'", 'that', 'xxunk', ':', 'to', ...
I would have expected a pronoun, such as “you”, “her”, “me”, …
The other examples are also not really convincing.
I had the impression that the model is just learning a general bias towards common words. The paper on AWD-LSTM reports very competitive perplexity values. I would have expected to be able to perform better with this. So my question is, is my approach broken ?
My first intuition was that I didn’t set up the hidden state properly. But after debug-stepping through your implementation, I don’t think that this is the case. The hidden states are initialized in
reset() and then just updated for every entry in the sequence.
For fun, I used the setup to sample some more values. The model is given the following sentence:
Winter is coming after all. The weather has become horrible. All week it has been raining and today it
This is fed to the model, highest prediction is appended to the sentence, repeat. The model completes the sentence:
Winter is coming after all. The weather has become horrible. All week it has been raining and today it has become a major success.
One after the other, the tokens “has”, “become”, “a”, “major”, “success”, “.” are predicted.
The grammar is nice and it even ends the sentence. So I guess my impression that it just learns a bias towards common words is false. But there is no context awareness at all.
Is this how it should behave ?