It’s just 07a_lsuv.ipynb
as is.
Here is another variant I tried, attempting to balance the adjustments, while having std=1, mean=0 post-lsuv_module
:
def lsuv_module(m, xb):
h = Hook(m, append_stat)
while mdl(xb) is not None and (abs(mean) > 1e-3 or abs(std-1) > 1e-3):
mean,std = h.mean,h.std
if abs(mean) > 1e-3: m.bias -= mean
if abs(std-1) > 1e-3: m.weight.data /= std
h.remove()
return h.mean,h.std
(note: it recalculates mean/std twice, but I didn’t bother refactoring, since it’s just a proof of concept.)
it works better (i.e. trains), but still getting nan
s every so often. This is with the default lr=0.6
of that nb.
But, of course, the original nb gets nan
s too every so often, so that learning rate is just too high.
With a lower lr=0.1
the original reversed order std+bias approach trains just fine too.
Here is a refactored “balanced” version:
def lsuv_module(m, xb):
h = Hook(m, append_stat)
while mdl(xb) is not None:
mean,std = h.mean, h.std
if abs(mean) > 1e-3 or abs(std-1) > 1e-3 :
m.bias -= mean
m.weight.data.div_(std)
else: break
h.remove()
return h.mean,h.std
Perhaps self.sub
in GeneralReLU needs to be a parameter and then the init will only affect the initial setting and then let the network tune it up.