In Attention
part, train
function:
def train(input_variable, target_variable, encoder, decoder, encoder_optimizer,
decoder_optimizer, criterion, max_length=MAX_LENGTH):
...
encoder_outputs = Variable(torch.zeros(max_length, encoder.hidden_size))
loss = 0
for ei in range(input_length):
encoder_output, encoder_hidden = encoder(input_variable[ei], encoder_hidden)
encoder_outputs[ei] = encoder_output[0][0]
...
I have some questions about this piece of code.
- Why
encoder_outputs
don’t have to considerbatch_size
of input of encoder?
Variable(torch.zeros(batch_size, max_length, encoder.hidden_size))
- Why
encoder_outputs
only record first word output from encoder? Why not the last output?
encoder_outputs[ei] = encoder_output[0][-1]