You are probably making one of the two following assumptions, just as I did
- The input is fed directly to a sigmoid function
- There is a matrix multiplication involved before piping it through the sigmoid function, but the weights and biases are the same for all three cases.
If you look closel, the input and the state remain the same for all three, there is a matrix multiplication involved and the output of this matrix multiplication is what is passed on to the sigmoid function. If the weights and biases involved in these operations are different for the three cases, we can expect the output of the sigmoid to be completely different.
And sure enough…