Is it possible to use beam search during the training of LMs?

I see beam search available for inference … but I was wondering if it is possible (or even advisable) to use it during training?

By default, most implementations (including fastai) use greedy decoding in training these models … and while I see folks on here using beam search for inference, I can’t say I’ve seen anything in terms of folks using it during training (which seems like it could be advantageous).

Anyhow, is it possible? Recommended? Implemented? W/r/t to the last question, if it isn’t, what would be the best approach to implementing?

Sequence-to-Sequence Learning as Beam-Search Optimization discusses on they use beam-search during training.

1 Like