Fixed! What should be mean and std for validation in lesson7-cifar10.ipynb? (Batch Norm)

The Batch Norm layer that we looked at in lesson 7 looks like this:

class BnLayer(nn.Module):
    def __init__(self, ni, nf, stride=2, kernel_size=3):
        super().__init__()
        self.conv = nn.Conv2d(ni, nf, kernel_size=kernel_size, stride=stride,
                              bias=False, padding=1)
        self.a = nn.Parameter(torch.zeros(nf,1,1))
        self.m = nn.Parameter(torch.ones(nf,1,1))
        
    def forward(self, x):
        x = F.relu(self.conv(x))
        x_chan = x.transpose(0,1).contiguous().view(x.size(1), -1)
        if self.training:
            self.means = x_chan.mean(1)[:,None,None]
            self.stds  = x_chan.std (1)[:,None,None]
        return (x-self.means) / self.stds *self.m + self.a

It works if you train and than validate but not if you want to load in a network and not train it before validation. This is because self.means and self.stds is set only if self.training. What should I do if I want to be able to use the network for validation (without training) after loading it? Should I perhaps save the std and mean too? Perhaps adjusting the code of the BN layer will work too…

Do we not want to set a new mean and std for the validation because that way the result would be different? If so, maybe I could have two extra items in the learner.model.state_dict() for the last mean and std so they can be saved and loaded somehow.

Is there a way to save and load the mean and std? Do I even need to do that or is there another way?

If I print(self.means[0]) just before the return I get loads of different means when training but also when using:

learn.model.cuda().eval()
set_trainable(learn.model, False)

I think that might be a bug! Or I just really don’t understand what is happening at all with this class… Why are the means changing when validating are they meant to???


The testing part is not implemented in that BN layer… Should I implement it or is there a better way than editing that function? Does this mean that the validation set’s loss is calculated incorrectly everywhere that there is BN???

If I look at the code of _BatchNorm it does have this functionality.

class _BatchNorm(Module):
    def __init__(self, num_features, eps=1e-5, momentum=0.1, affine=True):
        super(_BatchNorm, self).__init__()
        self.num_features = num_features
        self.affine = affine
        self.eps = eps
        self.momentum = momentum
        if self.affine:
            self.weight = Parameter(torch.Tensor(num_features))
            self.bias = Parameter(torch.Tensor(num_features))
        else:
            self.register_parameter('weight', None)
            self.register_parameter('bias', None)
        self.register_buffer('running_mean', torch.zeros(num_features))
        self.register_buffer('running_var', torch.ones(num_features))
        self.reset_parameters()

    def reset_parameters(self):
        self.running_mean.zero_()
        self.running_var.fill_(1)
        if self.affine:
            self.weight.data.uniform_()
            self.bias.data.zero_()

    def forward(self, input):
        return F.batch_norm(
            input, self.running_mean, self.running_var, self.weight, self.bias,
            self.training, self.momentum, self.eps)

To fix not being able to validate right after loading the model just replace that BnLayer with this:

class BnLayer(nn.Module):
    def __init__(self, ni, nf, stride=2, kernel_size=3):
        super().__init__()
        self.conv = nn.Conv2d(ni, nf, kernel_size=kernel_size, stride=stride,
                              bias=False, padding=1)

        self.batchNorm = nn.BatchNorm2d(nf)

    def forward(self, x):
        return self.batchNorm(F.relu(self.conv(x)))

Just to clarify - this lesson shows simplified BN layer to show the basic idea. It’s not meant to be a replacement for Pytorch’s. I discuss in the lesson some of the missing features of this simplified implementation.

1 Like