Transformer: why padding at the beginning of target tensor?

Skumarr53 · May 6, 2020, 6:02am

I’m working on Attention is all you need implementation and referring to 8-translation-transformer.ipynb. I couldn’t figure out the reason for shifting the target by adding a pad token at the beginning.

Here is the code:

def shift_tfm(b):   #
    x,y = b
    y = F.pad(y, (1, 0), value=1)
    return [x,y[:,:-1]], y[:,1:]