Interesting read on neural text [de]generation

Hi everyone,

Just wanted to share this interesting paper I was reading about neural text generation. Having trained a bunch of models for a Bengali language generation - summarization - translation project I came across this a lot. Making intelligent sentences was not the easiest thing to do and I was at a loss on how to improve on this.

The paper goes through likelyhood maximization and more random-generation processes and also details their own contribution, called Nucleus Sampling that seems to be a sort of dynamic top k method, where the distribution is actually changing as we sample.


Thought this might be interesting to any of you, and maybe itself given that NLP is a big reason why a lot of us started with it.

Kind regards,


The paper is interesting, thank you for sharing. I ended up implementing the nucleus sampling with fastai by adding a new method to the learner. I don’t see a great difference in the results, at least for the type of text I’m training, but I need to look more carefully at a larger number of samples. Also this is the first NLP project I’ve done so I wouldn’t trust anything I say.

The added method (defaults to 90% cumulative probability as used in the paper’s example):

def nucleus_predict(self, text:str, n_words:int=1, top_p:float=0.9, sep:str=' ', decoder=decode_spec_tokens):
    ds =
    xb, yb =
    new_idx = []
    for _ in range(n_words):
        res  = self.pred_batch(batch=(xb,yb))[0][-1]
        nucleus = []
        sum = 0
        for p in res:
            sum += p
            if sum > top_p:
        idx = torch.multinomial(torch.FloatTensor(nucleus), 1).item()
        xb = xb.new_tensor([idx])[None]
    return text + sep + sep.join(decoder(, sep=None)))

I supposed if someone were to PR this it would be much better to just add it to predict() if top_p was assigned, but I don’t think it has been requested, this was just a quick prototype.

1 Like