In the Lesson 7 Human Numbers notebook there is a difference between the course notebook and the lesson video.
In the lesson video, for the second RNN example (’ Model2’) batch normalisation is applied to the feedback connection ‘h’, whereas in the course notebook batch normalisation is only applied to the output, not the feedback.
So the github notebook has this:
def forward(self, x):
h = torch.zeros(x.shape[0], nh).to(device=x.device)
res = []
for i in range(x.shape[1]):
h = h + self.i_h(x[:,i])
h = F.relu(self.h_h(h))
res.append(self.h_o(self.bn(h)))
return torch.stack(res, dim=1)
whereas the lesson video shows this:
def forward(self, x):
h = torch.zeros(x.shape[0], nh).to(device=x.device)
res = []
for i in range(x.shape[1]):
h = h + self.i_h(x[:,i])
h = self.bn(F.relu(self.h_h(h)))
res.append(self.h_o(h))
return torch.stack(res, dim=1)
i.e. the feedback connection ‘h’ now has batchnorm applied:
h = self.bn(F.relu(self.h_h(h)))
This second version would also match with how the feedback is created in the earlier RNN example (‘Model1’). Indeed, changing to this seems to give slightly better results.
I presume that the lesson notebook is wrong and that batch normalisation should be applied to the feedback connection ‘h’?