RNN loop uses previous time step’s hidden state to predict current output in the PyTorch tutorial

pai095 · November 13, 2020, 3:04am

class RNN(nn.Module):
def init(self, input_size, hidden_size, output_size):
super(RNN, self).init()

    self.hidden_size = hidden_size

    self.i2h = nn.Linear(input_size + hidden_size, hidden_size)
    self.i2o = nn.Linear(input_size + hidden_size, output_size)
    self.softmax = nn.LogSoftmax(dim=1)

def forward(self, input, hidden):
    combined = torch.cat((input, hidden), 1)
    hidden = self.i2h(combined)
    output = self.i2o(combined)
    output = self.softmax(output)
    return output, hidden

Hi, I’ve just begun using Pytorch and was going through the RNN example in https://pytorch.org/tutorials/intermediate/char_rnn_classification_tutorial. As per my understanding, the current time step’s output is predicted using the current hidden state. But here, the previous time step’s hidden state seems to be used. Can I get an explanation? Thanks.

Pomo · November 13, 2020, 6:26am

Hi Pramod,

I think your understanding is correct. My best guess is that it is an error in the PyTorch tutorial.

If you go to the tutorial you link and then to the tutorial it is derived from at
https://pytorch.org/tutorials/beginner/former_torchies/nnft_tutorial.html

you see that the source tutorial implements RNN exactly as you understand.

It may be that that the two implementations are equivalent, that i2o somehow manages to include the transformation learned in i2h. But such is not obviously true, at least to me. At best, the second implementation is confusing as presented.

Good catch!!!

pai095 · November 14, 2020, 12:22am

Thank you for your reply. I’ll continue to work with the understanding that I currently have.