I’m working on Attention is all you need implementation and referring to 8-translation-transformer.ipynb. I couldn’t figure out the reason for shifting the target by adding a pad token at the beginning.
Here is the code:
def shift_tfm(b): #
x,y = b
y = F.pad(y, (1, 0), value=1)
return [x,y[:,:-1]], y[:,1:]