Lesson 11 discussion and wiki

It’s just 07a_lsuv.ipynb as is.

Here is another variant I tried, attempting to balance the adjustments, while having std=1, mean=0 post-lsuv_module:

def lsuv_module(m, xb):
    h = Hook(m, append_stat)

    while mdl(xb) is not None and (abs(mean) > 1e-3 or abs(std-1) > 1e-3):
        mean,std = h.mean,h.std
        if abs(mean)  > 1e-3: m.bias -= mean
        if abs(std-1) > 1e-3: m.weight.data /= std
        
    h.remove()
    return h.mean,h.std

(note: it recalculates mean/std twice, but I didn’t bother refactoring, since it’s just a proof of concept.)

it works better (i.e. trains), but still getting nans every so often. This is with the default lr=0.6 of that nb.

But, of course, the original nb gets nans too every so often, so that learning rate is just too high.

With a lower lr=0.1the original reversed order std+bias approach trains just fine too.

Here is a refactored “balanced” version:

def lsuv_module(m, xb):
    h = Hook(m, append_stat)

    while mdl(xb) is not None:
        mean,std = h.mean, h.std
        if abs(mean) > 1e-3 or abs(std-1) > 1e-3 :
            m.bias -= mean
            m.weight.data.div_(std)
        else: break
                
    h.remove()
    return h.mean,h.std

Perhaps self.sub in GeneralReLU needs to be a parameter and then the init will only affect the initial setting and then let the network tune it up.

2 Likes