I would like also to know how we can get the probability of each words, since I need it for the beam searching. If res[-1] is really some kind of word scoring, maybe we could use softmax to convert the values to probability, something like: nn.Softmax(res[-1]), but since I am a beginner in PyTorch, I am not sure if it is correct or not.

yes, nn.Softmax(res[-1]) will calculate the probability of all words in vocabulary since res[-1] still contains values of all words. It is difference than the result of torch.topk(res[-1], X) where it contains only X biggest probability and its index.

This seems to work. Thanks @cahya. @Boone you might be interested.

Any obvious mistakes?

def calc_prob(model, text):
texts = ['xbos xfld 1 ' + text]
tokens = Tokenizer().proc_all_mp(partition_by_cores(texts))
# initilize probability to 1
prob = 1.0
aggregated_token_indexes = []
# we are only dealing with one sentence, so only look at tokens[0]
for token in tokens[0]:
# get the index of the token
token_idx = stoi[token]
# don't predict on a zero-length array
if len(aggregated_token_indexes) > 0:
# we want a [x,1] array where x is the number
# of words inputted (including the prefix tokens)
ary = np.reshape(np.array(aggregated_token_indexes),(-1,1))
# turn this array into a tensor
tensor = torch.from_numpy(ary)
# wrap in a torch Variable
variable = Variable(tensor)
# batch size of 1
m[0].bs=1
# make sure we are in evaluation mode
m.eval()
m.reset()
# predict what word comes next, based on the text BEFORE this current word
res,*_ = m(variable)
# res[-1] contains the scores of each possible token in the vocabuary
# use softmax to turn it into a probability
all_token_probs = F.softmax(res[-1]).data
# find the probability of this token and multiply by the probability of all the previous text
prob *= all_token_probs[token_idx]
# aggrgate this token on for the next loop
aggregated_token_indexes.append(token_idx)
return prob