Questions about attention model in seq2seq-translation.ipynb

In Attention part, train function:

def train(input_variable, target_variable, encoder, decoder, encoder_optimizer, 
          decoder_optimizer, criterion, max_length=MAX_LENGTH):

    ...

    encoder_outputs = Variable(torch.zeros(max_length, encoder.hidden_size))
    loss = 0

    for ei in range(input_length):
        encoder_output, encoder_hidden = encoder(input_variable[ei], encoder_hidden)
        encoder_outputs[ei] = encoder_output[0][0]
   
    ...


I have some questions about this piece of code.

  • Why encoder_outputs don’t have to consider batch_size of input of encoder?
Variable(torch.zeros(batch_size, max_length, encoder.hidden_size))
  • Why encoder_outputs only record first word output from encoder? Why not the last output?
encoder_outputs[ei] = encoder_output[0][-1]