bn_unfreeze(True)

AnuragTamboli · December 8, 2017, 11:40am

Hello everyone,

I was trying to code cat vs dog problem. But as soon as i execute “learn.bn_freeeze(True)”. It give error like :

AttributeError: ‘ConvLearner’ object has no attribute ‘bn_unfreeze’

I have search over internet, forum but couldn’t found any solution.
Any suggestion will be appreciated.

Regards
Anurag Tamboli

creviera · December 8, 2017, 8:38pm

Anyone know the difference between these two methods:

unfreeze(self)
bn_freeze(self, do_freeze), when do_freeze is set to False

Any guesses as to what the bn stands for?

It’s pretty clear from the code below that unfreeze sets the children in the model to trainable:

    def freeze_to(self, n):
        c=self.get_layer_groups()
        for l in c:     set_trainable(l, False)
        for l in c[n:]: set_trainable(l, True)

    def unfreeze(self): self.freeze_to(0)

But what are children? It looks like children is a Torch concept that represents the layers in the model. Is this correct? Do all the children represent all the layers? If this is the case then I think calling unfreeze makes it so that all of the weights for all of the layers are updated when we do our training during fit.

It’s a little less clear what bn_freeze(False) would do because you have to understand what apply_leaf does and how the model’s bn_freeze attribute is used, as you can see below:

    def set_bn_freeze(self, m, do_freeze):
        if hasattr(m, 'running_mean'): m.bn_freeze = do_freeze

    def bn_freeze(self, do_freeze):
        apply_leaf(self.model, lambda m: self.set_bn_freeze(m, do_freeze))

I looks like apply_leaf recursively applies the above lambda function, which sets the model’s bn_freeze attribute to False (for our example), to all the children in the model. See apply_leaf code here:

def apply_leaf(m, f):
    c = children(m)
    if isinstance(m, nn.Module): f(m)
    if len(c)>0:
        for l in c: apply_leaf(l,f)

And it looks like the model’s bn_freeze attribute is only used in one place in the fastAi code, inside model.py:

def set_train_mode(m):
    if (hasattr(m, 'running_mean') and
        (getattr(m,'bn_freeze',False) or not getattr(m,'trainable',False))): m.eval()
    else: m.train()

The set_train_mode function gets called when we do our training during fit. Looks like if bn_freeze is False then train is called, so long as trainable is also set to True, which I think it is.

So this looks similar to unfreeze in that it trains all the children when bn_freeze is set to false. I’m sure I’m missing something, since why would we have two different ways of doing the same thing. My guess is that trainable being set to True must mean something different than bn_freeze being set to False.

creviera · December 8, 2017, 9:06pm

Looks like bn stands for batch normalization (a concept I believe we’re going to go over in the next lecture), and also Jeremy talks about bn_freeze in the following post: [Adv] Significant changes to fastai just pushed

From the post, Jeremy says:

I discovered that inceptionresnet-v2 and inception-v4 were not training well on dogs v cats after unfreezing. I think I’ve tracked it down to an interesting issue with batchnorm. Basically, updating the batchnorm moving statistics causes these models to fall apart pretty badly. So I’ve added a new learn.bn_freeze(True) method to freeze all bn statistics. This should only be called with precompute=False, and after training the fully connected layers for at least one epoch. I’d be interested to hear if people find this new option helps any models they’ve been fine-tuning.

Finally, I’ve changed the meaning of the parameter to freeze_to() so it now refers to the index of a layer group, not of a layer. I think this is more convenient and less to learn for students, since we use layer groups when we set learning rates, so I think this method should be consistent with that.

And it’s discussed here: Freezing batch norm, but this discussion seems more advanced and hard for me to understand.

I don’t see how we’re able to tell which layers are batch normal layers, but perhaps somehow bn_freeze(False) only unfreezes layers that are batch normal, and that’s the difference between that and unfreeze.

jeremy · December 9, 2017, 2:09am

Yeah this definitely isn’t a beginner topic Basically, if you’re using datasets similar to imagenet, try using bn_freeze when fine tuning, and see if it helps.

ecdrid · December 12, 2017, 5:56pm

BatchNorm will only update the running averages in train mode.. Because in test time as Jeremy had said today we don't do the normalisation part (can't recall exactly but something like this or different in today's lecture)

Is this the inference?

ecdrid · December 12, 2017, 6:03pm

Sorry if it’s very basic level,

Are ReLU after MaxPool and MaxPool after ReLU equivalent operations?

ecdrid · December 12, 2017, 6:08pm

For further insights…(to-do list)

“Glossary of Deep Learning: Batch Normalisation” @jaroncollis https://medium.com/deeper-learning/glossary-of-deep-learning-batch-normalisation-8266dcd2fa82

“Batch normalization in Neural Networks” @phidaouss https://towardsdatascience.com/batch-normalization-in-neural-networks-1ac91516821c

jeremy · December 13, 2017, 11:49pm

Good question. Have a think about it, and tell us what you think the answer is! Try looking at the Excel conv spreadsheet if you’re having trouble…

ecdrid · December 14, 2017, 11:23pm

assuming Relu used is max(0,x)

What differs the most is when the negative values are thrown away (may or may not be correct)

MaxPool after Relu
This will mean that since we have done the Relu before, we have lost all the -ve values beforehand which might play a role..
Relu after Maxpool
This will mean that since we have done the Maxpool before, we have the -ve values now which might play a role but....

But here’s the problem…
Since whether we are doing Relu after or before we will neglect the negative values either ways(it doesn't matter even if we do a AvgPool because average of +ve will remain positive) or will become zero... Neurons might become inactive for essentially all inputs in that case. In this state, no gradients flow backward through the neuron, and so the neuron becomes stuck in a perpetually inactive state and ultimately it dies.. Also it's should be computationally efficient to do Maxpool before and Relu after(?)..
Am I missing something else?

Thanks…

jeremy · December 15, 2017, 5:34pm

That all sounds right to me - nice job! Not sure about the performance implication - it’s hard to know with massively parallel architectures like GPU exactly how performance is impacted.