Clock Work RNNs

I have been reading up about the Clockwork RNNs (CW-RNN) and trying to understand the math behind the backpropagation in these networks. I am stuck at calculating the gradient of cost function with respect to the weight of the input. Can anyone help me understand it in detail or perhaps point me to some links where I can look at the derivation. Thanks in advance.

Same here. Looking forward to answers…