Ch12 Language Model 4 and 5 h.detach() location

DanielLam · May 9, 2020, 4:36am

Hi all,

Is there a discrepancy in where h.detach happens in language model 4 and 5?

    def forward_LM4(self, x):
        outs = []
        for i in range(sl):
            self.h = self.h + self.i_h(x[:,i])
            self.h = F.relu(self.h_h(self.h))
            outs.append(self.h_o(self.h))
        self.h = self.h.detach()
        return torch.stack(outs, dim=1)

    def forward_LM5(self, x):
        res,h = self.rnn(self.i_h(x), self.h)
        self.h = h.detach()
        return self.h_o(res)

In LM4, self.h_o() happens before detach. But in LM5, self.h_o() happens after detach. Does LM5 have a bug? I think the detach should happen after self.h_o().

Thanks,
Daniel Lam