Lesson 11 discussion and wiki

stas · April 12, 2019, 11:48pm

I’m trying to wrap my head around 07a_lsuv.ipynb:

    while mdl(xb) is not None and abs(h.mean)  > 1e-3: m.bias -= h.mean
    while mdl(xb) is not None and abs(h.std-1) > 1e-3: m.weight.data /= h.std

the nb annotation says:

Note that the mean doesn’t exactly stay at 0. since we change the standard deviation after by scaling the weight.

It wasn’t helping me to understand why this was so, since, normally dividing by something shouldn’t affect mean.

But since we have ReLU, after the std adjustment changed conv2d weights leading to different output, so this adjustment will impact what ReLU will let through and thus the mean will be different.

But, wait, let’s flip the 2 and adjust the fixed bias post ReLU after changing the std:

    while mdl(xb) is not None and abs(h.std-1) > 1e-3: m.weight.data /= h.std
    while mdl(xb) is not None and abs(h.mean)  > 1e-3: m.bias -= h.mean

and voila now we get mean=0, std=1:

for m in mods: print(lsuv_module(m, xb))
(2.9802322387695312e-08, 1.0000479221343994)
(6.99442237461767e-09, 0.9994299411773682)
(-4.6566128730773926e-09, 0.9994862079620361)
(-2.7939677238464355e-08, 0.9999381303787231)
(-1.1175870895385742e-08, 1.00041663646698)

The change works, since the sub shift happens after ReLU:

class GeneralRelu(nn.Module):
    [...]
    def forward(self, x):
        x = F.relu(x) # removed leaky_relu since in this nb it is not used
        if self.sub is not None: x.sub_(self.sub)

p.s. should those while loops check that they don’t end up being an infinite loop?