From the video about neural translation (attached screenshot below), in the forward pass of the decoder, why is the output decoder hidden state reinjected after each iteration instead of only using encoder hidden state? And it would even harm the learning, because then the decoder wouldn’t be decoding the encoder anymore but decoding its own output?
From what I understand. the loop is being using because
nn.rnn cannot use the output at cell N for input at N+1. Ideally, we’d do the above loop in one forward pass, where enc hidden state is used and then predictions are chained to subsequent decoder unit’s input. Is this right?
If it is, then we would want to keep the same encoder hidden state in each iteration, and then the loss be calculated only for the tensors last iteration. In the last iteration, we would’ve gone through the decoder, chaining predictions into inputs for all units, with the loss to be calculated only on this timestep