The input_mask of fastai Transformer

Founty · April 15, 2020, 4:02am

it seems that the current Transformer implementation code can not support input_mask.
when constructing the input in Transformer encoder or Bert, we always pad the input, e.g,
batch[0] : A B C [pad] [pad] --> input_mask 1 1 1 0 0
batch[1]: D E [pad] [pad] [pad] --> input_mask 1 1 0 0 0,
in which the input_mask is applied to MultiHeadAttention to avoid accessing padding information.

New feature discussed here

Am I wrong or it is indeed not implemented in fastiai.text.model.Transformer

Also, the notebook about transformer in here does not consider the input mask neither.