Lesson 10 Discussion & Wiki (2019)

So I was revisiting lesson 10 and had the same thought. We have this awesome GeneralRelu, why don’t we just learn all the parameters instead of predefining them. So i searched around to see if anyone on here had done it and i couldn’t find anything.

So I went ahead and implemented “LearnedRelu” which was super easy (assuming i did it right):

class LearnedRelu(nn.Module):
    def __init__(self, leak=0.1, sub=0.25, maxv=100):
        super().__init__()
        self.leak = nn.Parameter(torch.ones(1)*leak)
        self.sub  = nn.Parameter(torch.zeros(1)+sub)
        self.maxv = nn.Parameter(torch.ones(1)*maxv)

    def forward(self, x): 
        x = F.leaky_relu(x,self.leak.item())
        x.sub_(self.sub)
        x.clamp_max_(self.maxv.item()) 
        return x

So far it seems to work great. I started a separate thread on the topic with a gist of my work so far here: https://forums.fast.ai/t/learning-generalrelu-params-here-is-learnedrelu/44599

A basic (and might be silly) question regarding callbacks.

in the below code:

‘’’
class SlowCalculator():
def init(self, cb=None): self.cb,self.res = cb,0

def callback(self, cb_name, *args):
    print(cb_name)
    print('self.cb:')
    print(self.cb)
    if not self.cb: return
    cb = getattr(self.cb,cb_name, None)
    print('cb:')
    print(cb)
    if cb: return cb(self, *args)

def calc(self):
    for i in range(5):
        self.callback('before_calc', i)
        self.res += i*i
        sleep(1)
        if self.callback('after_calc', i):
            print("stopping early")
            break

class ModifyingCallback():
def after_calc (self, calc, epoch):
print(f"After {epoch}: {calc.res}")
if calc.res>10: return True
if calc.res<3: calc.res = calc.res*2
‘’’

what is calc in ModifyingCallback() class. How are we passing it from SlowCalculator().

Yeah I had the same question also. Has this been answered by some chance already? I opened a PR about this: https://github.com/fastai/fastai_docs/pull/107

how do we overcome tuple error . I am running nbk in google colab ?

Just a quick suggestion for the fastai team, I’m not a huge fan of having all the callback names as magic strings. It would be easier to have a class full of constants that defined all the callback names, would save you having to look at the docs or dig into the source code all the time.

Did you see the use of SimpleNamespace in 11a_transfer_learning?

1 Like

For the curious, here’s a plot of the stats for the latest version of RunningBatchNorm from 07_batchnorm.ipynb after a single epoch.

Note how the mean stays within ±0.15 of 0 and the std stays within ±0.3 of 1 for all layers!

3 Likes

To anyone who wants to go deeper into Batch Norm technique understanding :

1 Like

good day friends, quick question, in lesson 10:
when using our own BatchNorm layer
if bn: layers.append(BatchNorm(nf))

and then running an epoch and looking at the hooks at the start
for h in hooks[:-1]:

the graphs will show that means of all layers begin at 0, and
stds of all layers begin at 1, all great.

but if I change to use Pytorch BatchNorm layer:

if bn: layers.append(nn.BatchNorm2d(nf, eps=1e-5, momentum=0.1))

then repeating the same hook graphs, shows that yes the means
still all begin at 0, but not the stds, the stds of the different
layers don’t begin at 1 and also begin from different positions,

And I’ve just checked that a similar issue happens to me when checking the graph that shows the percentage of activations that are near 0,
when using Fast.ai batchnorm, only 20% or so are near 0

but when using Pytorch BatchNorm, the percentage is way higher:

I have checked the code against the notebook and can’t find a glitch,
why that difference?, thank you for the help

Hi, if you post the code(or a link) of your experiment then it would be easier to help you out. :wink:

1 Like

Thank you Fabrizio :wink:

so the conv_layer is

def conv_layer(ni, nf, ks=3, stride=2, bn=True, **kwargs):
    layers = [nn.Conv2d(ni, nf, ks, padding=ks//2, stride=stride, bias=not bn),
              GeneralRelu(**kwargs)]
    if bn: layers.append(nn.BatchNorm2d(nf, eps=1e-5, momentum=0.1))
    return nn.Sequential(*layers)

and then I do:

path = datasets.untar_data(datasets.URLs.IMAGENETTE_160)
bs=128
train_dl, valid_dl=get_dls(ll.train, ll.valid, bs, num_workers=4)
tfms = [make_rgb, ResizeFixed(128), to_byte_tensor, to_float_tensor]
il = ImageList.from_files(path, tfms=tfms)
sd = SplitData.split_by_func(il, partial(grandparent_splitter, valid_name='val'))
ll = label_by_func(sd, parent_labeler, proc_y=CategoryProcessor())
data = ll.to_databunch(bs, c_in=3, c_out=10, num_workers=4)
nfs = [64,64,128,128]

cbfs = [Recorder,
        partial(AvgStatsCallback,accuracy),
        CudaCallback,
        partial(BatchTransformXCallback, norm_imagenette)]

learn,run = get_learn_run(nfs, data, 0.9, conv_layer, cbs=cbfs)  

with Hooks(learn.model, append_stats) as hooks:
    run.fit(1, learn)
    fig,(ax0,ax1) = plt.subplots(1,2, figsize=(10,4))
    for h in hooks[:-1]:
        ms,ss,hi = h.stats
        ax0.plot(ms[:10])
        ax1.plot(ss[:10])
    plt.legend(range(6));
    
    fig,(ax0,ax1) = plt.subplots(1,2, figsize=(10,4))
    for h in hooks[:-1]:
        ms,ss,hi = h.stats
        ax0.plot(ms)
        ax1.plot(ss)

and append_stats is:

def append_stats(hook, mod, inp, outp):
    if not hasattr(hook,'stats'): hook.stats = ([],[],[])
    means,stds,hists = hook.stats
    if mod.training:
        means.append(outp.data.mean().cpu())

thank you again for trying to help :wink:

@fabris Fabrizio here is the notebook with the code, please see if you can help me spot where is the mistake because of which the hooks are showing the means correctly sincronized well but the stds appear wrong, thank you very much :wink: :
https://colab.research.google.com/drive/1rip1MFYwxbleZfXwH80lbJW16SNdrLFj

btw, switching to the fast.ai running batchnorm makes it all work great, and all goes perfect,
so I have switched to use the fast.ai running batchnorm, still curious why using Pytorch’s batchnorm layer the std does that

I was watching the video and noticed that the train time for using running batch norm is just over twice that of using the normal batch norm. This seems like a large slowdown and I was wondering what might cause it and if it is a concern?

In the 07 notebook, why do we have parameters gamma and beta to add and multiply after batch norm if we are going to have a linear transformation immediately after? I can see having an add (beta) if the following layer has no bias, but the scaling factor seems like wasted computation.

I ran 50 trials with 1) the original RunningBatchNorm, 2) removing gamma, 3) removing beta and gamma and adding back bias to conv layer, 4) removing gamma and placing Batchnorm before the ReLU. I found no statistical difference in the validation error for 1), 2), or 3), (p=.8) but 4) was statistically better (p=.00001). Also, the original took roughly 7% longer to run than any of the alternatives.

It looks like it is best to have batchnorm BEFORE the non-linearity, at least in the MNIST 1 epoch example. This results in the epoch running 7% faster with a 15% lower error rate. It should also be fine to remove gamma as long as the activation function is a simple ReLU. Am I missing something?

I was wondering the same thing. (sorry for digging up an old post)
Could we not have the dead activations weights to be reset/reinit once in a while and get a fresh start for training?

In the last line of the callback function, we have:

if cb: return cb(self, *args)

the “self” here refers to the callback instance object.
This “self” does not denote the ModifyingCallback instance object.
Hopefully this helps.

I have a very basic question about using softmax. I am not sure if I understood what Jeremy explained in the lesson.

Is it a good practice to use categorical cross entropy as the loss function (it uses softmax), if we are not going to use softmax in production code for our model predictions? Or should we use binary cross entropy always, even if we have a single-label problem?

I have trained an accurate classifier using categorical cross entropy, and I use the output probabilities for several tasks, as ignoring unknown images. I do not apply softmax to the predicted images, but I used it during training.

Is it a good practice?

Thank you!!

It seems that at the end of notebook 06_cuda_cnn_hooks_init.ipynb, we don’t actually use GeneralReLU, because we don’t pass any arguments to get_learn_run as **kwargs, so we end up using regular ReLU. Am I missing something?

HI,
Please i got an error msg when try to use .var((0,2,3),keepdim=True) in batchnorm. the error msg is dim takes in int and not tuples. I was informed that this operation is only available in nightly version of pytorch.
I install the night version with :

pip install torch -f https://download.pytorch.org/whl/nightly/cu90/torch.html

Tho my cuda version is 9.1 . I noticed this only install torch 1.2.0 and did not install torchvision.
I run of the notebokk to test if it works, am getting this error msg

The issue is now that pytorch did not have a version for cuda 9.1. Though i have the normal pytorch and fastai working properly. i created a new enviroment just to be able to run the.var using nightly version

Hi,

Can someone explain the m.register_forward_hook(partial(f, self) part the is used inside the Hook class?

When Jeremy did the same with a function he used m.register_forward_hook(partial(append_stats, i)) where the i is the index of the current module.

how is the self on the partial can act the same as the i index?

Thanks!

1 Like