Freezing batch norm

zetyquickly · June 8, 2020, 8:18am

What do you think of such approach while fine-tuning?

Freeze some first N layer blocks and not freeze any after them while ALL batch normalization stay trainable:

sairahul/mlexperiments/blob/master/pytorch-lightning/fine_tuning_example.py#L48


    self.freeze()


    apply_init(self.model[1], init)


def split(self, split_on):
    "Split the model at `split_on`."
    if isinstance(split_on,Callable): split_on = split_on(self.model)
    self.layer_groups = split_model(self.model, split_on)
    return self


def freeze_to(self, n):
    "Freeze layers up to layer group `n`."
    for g in self.layer_groups[:n]:
        for l in g:
            if not isinstance(l, bn_types): requires_grad(l, False)
    for g in self.layer_groups[n:]: requires_grad(g, True)


def freeze(self):
    "Freeze up to last layer group."
    assert(len(self.layer_groups) > 1)
    self.freeze_to(-1)

Could anyone explain this way of fine-tuning?