I am trying to understand the order in which errors are calculated and weights are updated in Sequence RNN.
Does the error calculation step happen at the same time as the weight update at the end of the sequence?
My understanding is that the error calculation at each layer (say layer t) of the unrolled network is a function of the weights of the later layer (t+1).
So do you calculate your output error and immediately do your weight update for that layer before moving to the next layer down. As the next layer down (t-1) is dependent on the weights you just calculated at layer (t)?
Or do you calculate the errors first all the way through and then do the weight update afterwards? This would infer that the weight matrix would be the same for each error calculation which seems wrong?