Hi everyone,
Just wanted to share this interesting paper I was reading about neural text generation. Having trained a bunch of models for a Bengali language generation - summarization - translation project I came across this a lot. Making intelligent sentences was not the easiest thing to do and I was at a loss on how to improve on this.
The paper goes through likelyhood maximization and more random-generation processes and also details their own contribution, called Nucleus Sampling that seems to be a sort of dynamic top k method, where the distribution is actually changing as we sample.
source: https://arxiv.org/abs/1904.09751
Thought this might be interesting to any of you, and maybe fast.ai itself given that NLP is a big reason why a lot of us started with it.
Kind regards,
Theodore.
3 Likes
The paper is interesting, thank you for sharing. I ended up implementing the nucleus sampling with fastai by adding a new method to the learner. I don’t see a great difference in the results, at least for the type of text I’m training, but I need to look more carefully at a larger number of samples. Also this is the first NLP project I’ve done so I wouldn’t trust anything I say.
The added method (defaults to 90% cumulative probability as used in the paper’s example):
def nucleus_predict(self, text:str, n_words:int=1, top_p:float=0.9, sep:str=' ', decoder=decode_spec_tokens):
ds = self.data.single_dl.dataset
self.model.reset()
xb, yb = self.data.one_item(text)
new_idx = []
for _ in range(n_words):
res = self.pred_batch(batch=(xb,yb))[0][-1]
nucleus = []
sum = 0
for p in res:
sum += p
nucleus.append(p)
if sum > top_p:
break
idx = torch.multinomial(torch.FloatTensor(nucleus), 1).item()
new_idx.append(idx)
xb = xb.new_tensor([idx])[None]
return text + sep + sep.join(decoder(self.data.vocab.textify(new_idx, sep=None)))
I supposed if someone were to PR this it would be much better to just add it to predict() if top_p was assigned, but I don’t think it has been requested, this was just a quick prototype.
1 Like