Lesson 7 - Human Numbers

In the Lesson 7 Human Numbers notebook there is a difference between the course notebook and the lesson video.

In the lesson video, for the second RNN example (’ Model2’) batch normalisation is applied to the feedback connection ‘h’, whereas in the course notebook batch normalisation is only applied to the output, not the feedback.

So the github notebook has this:

def forward(self, x):
    h = torch.zeros(x.shape[0], nh).to(device=x.device)
    res = []
    for i in range(x.shape[1]):
        h = h + self.i_h(x[:,i])
        h = F.relu(self.h_h(h))
        res.append(self.h_o(self.bn(h)))       
    return torch.stack(res, dim=1)

whereas the lesson video shows this:

def forward(self, x):
    h = torch.zeros(x.shape[0], nh).to(device=x.device)
    res = []
    for i in range(x.shape[1]):
        h = h + self.i_h(x[:,i])
        h = self.bn(F.relu(self.h_h(h)))
        res.append(self.h_o(h))            
    return torch.stack(res, dim=1)

i.e. the feedback connection ‘h’ now has batchnorm applied:

h = self.bn(F.relu(self.h_h(h)))

This second version would also match with how the feedback is created in the earlier RNN example (‘Model1’). Indeed, changing to this seems to give slightly better results.

I presume that the lesson notebook is wrong and that batch normalisation should be applied to the feedback connection ‘h’?

Similarly, I’ve just found that in the next example, ‘Model3’, that batch normalisation also isn’t applied on the feedback connection. In this case the course video does match the notebook.

Without the batch normalisation on the feedback ‘h’ the accuracy tends to be about 55%. Adding the batch normalisation to the feedback connection shows big improvements, with the accuracy now being 65-70%

def forward(self, x):
    res = []
    h = self.h
    for i in range(x.shape[1]):
        h = h + self.i_h(x[:,i])
        # h = F.relu(self.h_h(h))
        # res.append(self.bn(h))
        h = self.bn(F.relu(self.h_h(h)))
        res.append(h)