How to get probability of a sentence

Using ULMFit, how do I calculate the probability of a sentence?

I know how to get the most likely next word, but where can I find the probability?

torch.topk(res[-1], 3)  # Gives the indexes of the 3 most like words in order
w = itos[nexts[0].data[0]] # Get the most likely word

I think this is some kind of word score thing, but it doesn’t look like a probability:

res[-1][nexts[0].data[0]]

For example:

print(res[-1][nexts[0].data[0]], res[-1][nexts[1].data[0]], res[-1][nexts[2].data[0]])

gives: 18.6216, 9.8292, 8.5659

That makes sense, but how to I convert these numbers to probabilities?

I would like also to know how we can get the probability of each words, since I need it for the beam searching. If res[-1] is really some kind of word scoring, maybe we could use softmax to convert the values to probability, something like: nn.Softmax(res[-1]), but since I am a beginner in PyTorch, I am not sure if it is correct or not.

1 Like

I would also like to know how it works.

This sounds like it makes sense to me. I don’t know Pytorch well either. Shouldn’t the softmax be over all possible words somehow?

yes, nn.Softmax(res[-1]) will calculate the probability of all words in vocabulary since res[-1] still contains values of all words. It is difference than the result of torch.topk(res[-1], X) where it contains only X biggest probability and its index.

I just read also similar question somewhere else: https://discuss.pytorch.org/t/how-to-extract-probabilities/2720/11

So then we can use a script something like the final one on https://nlpforhackers.io/language-models/ to generate a final probability, right?

I’m going to try this anyway.

This seems to work. Thanks @cahya. @Boone you might be interested.

Any obvious mistakes?

def calc_prob(model, text):
    texts = ['xbos xfld 1 ' + text]
    tokens = Tokenizer().proc_all_mp(partition_by_cores(texts))
    
    # initilize probability to 1
    prob = 1.0
    aggregated_token_indexes = []
    
    # we are only dealing with one sentence, so only look at tokens[0]
    for token in tokens[0]:
        
        # get the index of the token
        token_idx = stoi[token]
        
        # don't predict on a zero-length array
        if len(aggregated_token_indexes) > 0:

            # we want a [x,1] array where x is the number
            #  of words inputted (including the prefix tokens)
            ary = np.reshape(np.array(aggregated_token_indexes),(-1,1))

            # turn this array into a tensor
            tensor = torch.from_numpy(ary)

            # wrap in a torch Variable
            variable = Variable(tensor)

            # batch size of 1
            m[0].bs=1
            
            # make sure we are in evaluation mode
            m.eval()
            m.reset()
            
            # predict what word comes next, based on the text BEFORE this current word
            res,*_ = m(variable)

            # res[-1] contains the scores of each possible token in the vocabuary
            # use softmax to turn it into a probability 
            all_token_probs = F.softmax(res[-1]).data

            # find the probability of this token and multiply by the probability of all the previous text
            prob *= all_token_probs[token_idx]

        # aggrgate this token on for the next loop 
        aggregated_token_indexes.append(token_idx)
    
    return prob
1 Like

Hi, thanks for the code, I will try it