Here’s a bit I’ve found:
To apply different hyper-parameters to different groups (differential learning rates, or no weight decay for certain layers for instance), you will need to adjust those values after the init.
Otherwise, I believe you can?
Here’s another bit I’ve read:
Parameters such as batchnorm weights/bias can be marked to always be in training mode, just put force_train=true
in their state.
With the tests:
params = [tst_params(), tst_params(), tst_params()]
opt = Optimizer(params, sgd_step, lr=0.1)
for p in L(params[1])[[1,3]]: opt.state[p] = {'force_train': True}
opt.freeze()
test_eq(L(params[0]).map(req_grad), [False]*4)
test_eq(L(params[1]).map(req_grad), [False, True, False, True])
test_eq(L(params[2]).map(req_grad), [True]*4)
Maybe this is a place to start looking? (This is 12_optimizer)
And here is exactly what you want I think:
def create_opt(self):
self.opt = self.opt_func(self.splitter(self.model), lr=self.lr)
if not self.wd_bn_bias:
for p in self._bn_bias_state(True ): p['do_wd'] = False
There is a parameter in fit
called wd_bn_bias=False
, along with train_bn=True
(This one was found in Learner
)
I’m not 100% sure if that’s what you’re looking for, trying to understand the rabbit hole myself, but these should be the related bits