In chapter 12, can someone help with understanding why we set the hidden state (h) to 0 and not torch.zeros(n_hidden,n_hidden) ? (This is also a research question). I tried re-reading the chapter but couldn’t think of a reason
The first time through the loop, the scalar h is broadcast across the self.i_h tensor. After that happens, h itself becomes a tensor. The other way,
h=torch.zeros(n_hidden,n_hidden), would have the same effect.
So we can use either if we want?
Initialising it zero is just making things easier.
Yes. It is just a Python programming style issue, nothing to do with machine learning in particular.