I think Jeremy's implementation of nn.RNN is wrong

In lesson 6 Jeremy reimplemented torch.nn.RNN. But according to Pytorch docs (https://pytorch.org/docs/stable/nn.html#torch.nn.RNN), it is an Elman RNN which should be implemented in this way (the simplest version):

def forward(self, c1, c2, c3):
     in1 = self.l_in(self.e(c1)) # instead of F.relu(self.l_in(self.e(c1)))
     in2 = self.l_in(self.e(c2)) # instead of F.relu(self.l_in(self.e(c2)))
     in3 = self.l_in(self.e(c3)) # instead of F.relu(self.l_in(self.e(c3)))

     h = V(torch.zeros(in1.size()).cuda())
     h = F.tanh(self.l_hidden(h)+in1) # instead of F.tanh(self.l_hidden(h+in1))
     h = F.tanh(self.l_hidden(h)+in2) # instead of F.tanh(self.l_hidden(h+in2))
     h = F.tanh(self.l_hidden(h)+in3) # instead of F.tanh(self.l_hidden(h+in3))
     return F.log_softmax(self.l_out(h))

And this implementation is also more accurate according to the architecture that Jeremy drew on the PowerPoint:

Why didn’t Jeremy be accurate? Is this an error or maybe it is made to better understand LSTM in the next lesson?

1 Like