This is really not that important but it’s driving me crazy. I am looking at the following code for the LSTM from lesson 12 (lecture 8):
h is a
torch.stack of the tensors
input are both two dimensional tensors, so the resulting
h should be three dimensional since
stack adds a dimension. In contrast,
torch.cat does not add a dimension.
self.forget_gate is also two dimensional. So how can the matrix multiplication in
self.forget_gate(h) be possible? Are we not multiplying a three-dimensional tensor by a two-dimensional tensor?
In the refactored LSTM later in the notebook, I don’t see the same shape problem. What am I missing?