RNN - hidden state doubts

Few doubts about RNN hidden state:

  1. Is the hidden state being set to zeros only once at init time? Or after every bptt character input?

  2. repackaging is retaining the values of h but making pytorch not remember how it was derived and hence breaking the chain of backpropagation. Right?

  3. In situation where the batchsize coming in is different than the size of h being remembered, could we not subset to use the appropriately sized part of h instead of initializing it all to zeros (where bs is < h.size(1)) and adding zeros for extra size when bs > h.size(1)).

May not be a big factor in training as different bs would happen only once an epoch but could this be a factor during inference?

  1. Take a look at the version where we use a for loop and tell us what you see. Also, what do think would be necessary to make it work (try looking at the very first multi layer net version before we use a for loop, if you’re not sure)?
  2. Right
  3. I guess we could, but as you say it’s not really important. At inference time I always have a batch size of 1 anyway.
2 Likes

Blockquote
Is the hidden state being set to zeros only once at init time? Or after every bptt character input?

So in Char3Model - h is set to zeros on each forward as all three characters are sent as input together.

In CharLoopModel again h is set to zeros on each forward. Also in CharRnn and CharSeqRnn.

But in CharSeqStatefulRnn its set to zeros only on init or when bs is different than size of h. So h is being remembered across different minibatches. Right?

I guess that is ok as it could possibly help remember something of long term consequence - which is beyond bptt limit?

Thanks

Right. We want to remember state as long as possible. That’s why this version is called “stateful”.