This would make the function more reusable for seq2seq tasks where I’m often seeing examples where the “source” sequence does not includes the BOS token while the “target” sequence does.
This would be a nice addition to the mark_fields argument that already exists and dictates the includes of FLD tokens. My recommendation is that the signature be:
Note that we don’t have an eos tag since it’s redundant with bos. We can certainly add the bos flag, and if you want to make a PR to add this bool, it would be most welcome.
Yah that is what I’m working on … I didn’t want to step on anything you all were planning/doing but if you like, I’ll submit the PR for both dictating BOS and EOS tags if that works for you all. lmk.