I believe that there are a few issues with the implementation of beam search.
-
The first is that the model is not put into eval mode. This is also true for
learner.predict
. Is this intentional? -
The more important issue has to do with the beam search algorithm itself. The way that it is currently implemented, when
top_k
is equal tobeam_sz
, the algorithm returnstop_k
copies of the same string. Please note that you will only see this behavior if you first put the model in eval mode – otherwise dropout will cause random fluctuations in the probabilities. As I understand this implementation,top_k
controls how many child tokens from any one particular path down the search tree should be considered during the selection of the topbeam_sz
paths. Iftop_k
is set to the size of the vocabulary, it should therefore have no influence on the beam search algorithm and you would expect the algorithm to findbeam_sz
different paths by the time the inner loop is complete. However, as currently implemented it will only returnbeam_sz
copies of one unique path.
The fix is as simple as commenting out the line that initializes xb
to have top_k
copies:
def beam_search(self, text:str, n_words:int, no_unk:bool=True, top_k:int=10, beam_sz:int=1000, temperature:float=1.,
sep:str=' ', decoder=decode_spec_tokens):
"Return the `n_words` that come after `text` using beam search."
ds = self.data.single_dl.dataset
self.model.reset()
self.model.eval()
xb, yb = self.data.one_item(text)
nodes = None
# xb = xb.repeat(top_k, 1)
nodes = xb.clone()
scores = xb.new_zeros(1).float()
...
To test this out, I started with the tutorial.inference.ipynb
notebook. I added a method called beam_search_v2
in which I added the change to put the model in eval mode, commented out the line above, and return the list of beam_sz
paths found at the end of the algorithm rather than returning a random sample from the beam_sz
paths. The code is:
def beam_search_v2(self, text:str, n_words:int, no_unk:bool=True, top_k:int=10, beam_sz:int=1000, temperature:float=1.,
sep:str=' ', decoder=decode_spec_tokens):
"Return the `n_words` that come after `text` using beam search."
#import pdb; pdb.set_trace()
ds = self.data.single_dl.dataset
self.model.reset()
self.model.eval()
xb, yb = self.data.one_item(text)
nodes = None
#xb = xb.repeat(top_k, 1)
nodes = xb.clone()
scores = xb.new_zeros(1).float()
with torch.no_grad():
for k in progress_bar(range(n_words), leave=False):
out = F.log_softmax(self.model(xb)[0][:,-1], dim=-1)
if no_unk: out[:,self.data.vocab.stoi[UNK]] = -float('Inf')
values, indices = out.topk(top_k, dim=-1)
scores = (-values + scores[:,None]).view(-1)
indices_idx = torch.arange(0,nodes.size(0))[:,None].expand(nodes.size(0), top_k).contiguous().view(-1)
sort_idx = scores.argsort()[:beam_sz]
scores = scores[sort_idx]
nodes = torch.cat([nodes[:,None].expand(nodes.size(0),top_k,nodes.size(1)),
indices[:,:,None].expand(nodes.size(0),top_k,1),], dim=2)
nodes = nodes.view(-1, nodes.size(2))[sort_idx]
self.model[0].select_hidden(indices_idx[sort_idx])
xb = nodes[:,-1][:,None]
return nodes
You can then test the number of unique paths returned by the beam search algorithm by the following:
nodes = learn.beam_search_v2('This is a simple test of', n_words=40, beam_sz=10, top_k=10)
num_unique_paths = len(set([tuple([x.item() for x in n]) for n in nodes]))
print(f"There are {num_unique} nodes")
With the line commented out we get the expected 10 unique paths. Without the line commented, we get 1 unique path.
I am happy to make a pull request if others agree that there is a bug here. Thanks!