Few doubts about RNN hidden state:
Is the hidden state being set to zeros only once at init time? Or after every bptt character input?
repackaging is retaining the values of h but making pytorch not remember how it was derived and hence breaking the chain of backpropagation. Right?
In situation where the batchsize coming in is different than the size of h being remembered, could we not subset to use the appropriately sized part of h instead of initializing it all to zeros (where bs is < h.size(1)) and adding zeros for extra size when bs > h.size(1)).
May not be a big factor in training as different bs would happen only once an epoch but could this be a factor during inference?