Wiki topic for the new callback and Learner/optimizer systems.
Hey Jeremy! Thanks for the new sub-forum and thank you and sgugger et al for all the work you have been doing for v2!
For those who do not know, a group of us on the forum recently worked to trying to find the state of the art for imagewoof, and we implemented a number of ideas. One of which includes what we call fit_fc
, Flat Cosine Annealing.
How different is the new callback system from before? And what are the major changes we should expect? Weâd like to get this implemented for both libraries due to how well it works with the new optimizers (Ranger, RAdam, etc)
For the fit in question, it is here:
class FlatCosAnnealScheduler(LearnerCallback):
"""
Manage FCFit training as found in the ImageNette experiments.
Code format is based on OneCycleScheduler
"""
def __init__(self, learn:Learner, lr:float=4e-3, moms:Floats=(0.95,0.999),
start_pct:float=0.72, start_epoch:int=None,curve='cosine',
tot_epochs:int=2):
super().__init__(learn)
n = len(learn.data.train_dl)
self.anneal_start = int(n*tot_epochs*start_pct)
self.batch_finish = (n*tot_epochs - self.anneal_start)
phase0 = TrainingPhase(self.anneal_start).schedule_hp('lr', lr)
if curve=="cosine":
curve_type=annealing_cos
elif curve=="linear":
curve_type=annealing_linear
elif curve=="exponential":
curve_type=annealing_exp
else:
raiseValueError(f"annealing type not supported {curve}")
phase1 = TrainingPhase(self.batch_finish).schedule_hp('lr', lr, anneal=curve_type)
phases = [phase0, phase1]
self.phases,self.start_epoch = phases,start_epoch
def on_train_begin(self, epoch:int, **kwargs:Any)->None:
"Initialize the schedulers for training."
res = {'epoch':self.start_epoch} if self.start_epoch is not None else None
self.start_epoch = ifnone(self.start_epoch, epoch)
self.scheds = [p.scheds for p in self.phases]
self.opt = self.learn.opt
for k,v in self.scheds[0].items():
v.restart()
self.opt.set_stat(k, v.start)
self.idx_s = 0
return res
def jump_to_epoch(self, epoch:int)->None:
for _ in range(len(self.learn.data.train_dl) * epoch):
self.on_batch_end(True)
def on_batch_end(self, train, **kwargs:Any)->None:
"Take a step in lr,mom sched, start next stepper when the current one is complete."
if train:
if self.idx_s >= len(self.scheds): return {'stop_training': True, 'stop_epoch': True}
sched = self.scheds[self.idx_s]
for k,v in sched.items(): self.opt.set_stat(k, v.step())
if list(sched.values())[0].is_done: self.idx_s += 1
Thanks for the question @muellerzr. The new callback system is extremely close to what we built in the latest âfrom the foundationsâ course. So have a look at that, then look at the new repo, and let us know if you have any questions. There are new annealing approaches that should make your callback way simpler, I think.
Iâve been playing around with the callback system and found that visualizing the callbacks in the training loop is helpful for understanding training in fastai in the same way that learn.summary()
is helpful for understanding the model.
Do you think this would be helpful in the library?
learn.show_training_loop()
Output:
Start Fit
- begin_fit : [TrainEvalCallback, Recorder, ProgressCallback, MixedPrecision]
Start Epoch Loop
- begin_epoch : [Recorder, ProgressCallback]
Start Train
- begin_train : [TrainEvalCallback, Recorder, ProgressCallback]
Start Batch Loop
- begin_batch : [MixedPrecision]
- after_pred : [MixedPrecision]
- after_loss : [MixedPrecision]
- after_backward : [MixedPrecision]
- after_step : [MixedPrecision]
- after_cancel_batch: []
- after_batch : [TrainEvalCallback, Recorder, ProgressCallback]
End Batch Loop
End Train
- after_cancel_train: [Recorder]
- after_train : [Recorder, ProgressCallback]
Start Valid
- begin_validate : [TrainEvalCallback, Recorder, ProgressCallback]
Start Batch Loop
- **CBs same as train batch**: []
End Batch Loop
End Valid
- after_cancel_validate: [Recorder]
- after_validate : [Recorder, ProgressCallback]
End Epoch Loop
- after_cancel_epoch: []
- after_epoch : [Recorder]
End Fit
- after_cancel_fit: []
- after_fit : [ProgressCallback, MixedPrecision]
Code for generating the snippet above:
def get_ordered_cbs(cbs, step):
return [cb for cb in sort_by_run(cbs) if hasattr(cb, step)]
@patch
def show_training_loop(learn:Learner):
loop = ['Start Fit', 'begin_fit', 'Start Epoch Loop', 'begin_epoch',
'Start Train', 'begin_train', 'Start Batch Loop', 'begin_batch',
'after_pred', 'after_loss', 'after_backward', 'after_step',
'after_cancel_batch', 'after_batch','End Batch Loop',
'End Train', 'after_cancel_train', 'after_train',
'Start Valid', 'begin_validate',
'Start Batch Loop', '**CBs same as train batch**', 'End Batch Loop',
'End Valid', 'after_cancel_validate', 'after_validate',
'End Epoch Loop', 'after_cancel_epoch', 'after_epoch', 'End Fit', 'after_cancel_fit', 'after_fit']
indent = 0
for s in loop:
if s.startswith('Start'): print(f'{" "*indent}{s}'); indent += 2
elif s.startswith('End'): indent -= 2; print(f'{" "*indent}{s}')
else: print(f'{" "*indent} - {s:15}:', get_ordered_cbs(learn.cbs, s))
This looks lovely, can you make a PR to add it in the Learner
notebook?
Thanks! Done
@sgugger I finished porting over the CosineAnnealing scheduler for the new optimizers. When Iâm exporting the notebooks to use in the library how should I go about this? Specifically because thereâs the callback along with fit_fc
. Should I do a 16a_callback_cos_anneal notebook and have both? Or what would you recommend.
Thanks!
The code in question:
#export
def FlatCosAnnealScheduler(self:Learner, lr:float=4e-3, tot_epochs:int=1, start_pct:float=0.72,
curve='cosine'):
n = len(self.dbunch.train_dl)
anneal_start = int(n * tot_epochs * start_pct)
batch_finish = ((n * tot_epochs) - anneal_start)
if curve=="cosine": curve_sched=SchedCos(lr, 0)
elif curve=="linear": curve_sched=SchedLin(lr, 0)
elif curve=="exponential": curve_sched=SchedExp(lr, 0)
else: raiseValueError(f"annealing type not supported {curve}")
scheds = [SchedNo(lr, start_pct), curve_sched]
scheds = {'lr': combine_scheds([start_pct, 1-start_pct], scheds)}
return scheds
def fit_fc(learn, tot_epochs:int=1, lr:float=1e-2, start_pct:float=0.72):
"Fit a model with Flat Cosine Annealing"
max_lr = learn.lr
callbacks = ParamScheduler(FlatCosAnnealScheduler(learn, lr, start_pct=start_pct, tot_epochs=tot_epochs))
learn.fit(tot_epochs, max_lr, cbs=callbacks)
In 2019 Part 2, notebook 11a_transfer_learning, we created a splitter function that was used to split the parameters of a model into groups that could be trained at different rates.
In fastai2, the splitter function fails a little test with an XResnet model I was using because the number of parameters in the two groups donât add up to the total number of parameters in the model.
Iâve dug into it and it appears that the parameters are double-counted within a ResBlock (at least, using l.children() in the splitter function).
Hereâs the splitter function:
def bn_splitter(m):
def _bn_splitter(l, g1, g2):
if isinstance(l, nn.BatchNorm2d): g2 += l.parameters()
elif hasattr(l, 'weight'): g1 += l.parameters()
for ll in l.children(): _bn_splitter(ll, g1, g2)
g1,g2 = [],[]
_bn_splitter(m[0], g1, g2)
g2 += m[1:].parameters()
return g1,g2
and you use it like this:
m = learn.model
g1,g2 = bn_splitter(m)
but this test fails with my xresnet50:
test_eq(len(g1)+len(g2), len(list(m.parameters())))
The body of the model (m[0]
) has 8 parts. Here are their parameter counts:
print("LayerType, n_parameters")
for l in m[0].children(): print(type(l),len(list(l.parameters())))
LayerType, n_parameters
<class âfastai2.layers.ConvLayerâ> 3
<class âfastai2.layers.ConvLayerâ> 3
<class âfastai2.layers.ConvLayerâ> 3
<class âtorch.nn.modules.pooling.MaxPool2dâ> 0
<class âtorch.nn.modules.container.Sequentialâ> 30
<class âtorch.nn.modules.container.Sequentialâ> 39
<class âtorch.nn.modules.container.Sequentialâ> 57
<class âtorch.nn.modules.container.Sequentialâ> 30
The parameter count for the parts agrees with the parameter count for the whole body:
3+3+3+30+39+57+30, len(list(m[0].parameters()))
165, 165
Now look at item 4, the first Sequential
object, which has 30 parameters:
- The first item is a ResBlock consisting of 3 ConvLayers plus a skip connection.
- Blocks 2 and 3 donât have the skip connection.
- Each conv layer has 3 parameters (2 for the batchnorm + 1 for the conv)
So the parameter counts for item 4 appear to be correct:
for l in m[0][4].children(): print(type(l),len(list(l.parameters())))
<class âfastai2.layers.ResBlockâ> 12
<class âfastai2.layers.ResBlockâ> 9
<class âfastai2.layers.ResBlockâ> 9
But if we zoom in one more level, the parameter count for the first ResBlock is wrong: it seems to be double-counting the ConvLayers (if Iâve interpreted it correctly):
for l in m[0][4][0].children(): print(type(l),len(list(l.parameters())))
<class âtorch.nn.modules.container.Sequentialâ> 9
<class âtorch.nn.modules.container.Sequentialâ> 9 #this one is double-counted
<class âtorch.nn.modules.container.Sequentialâ> 3
<class âtorch.nn.modules.activation.ReLUâ> 0
The bn_splitter
function seems to be using these double-counts, because it comes up with a number that is much too high.
I donât see exactly how to fix this. Iâd appreciate another set of eyes on it. Thanks! (and apologies for my lack of skill in formatting).
@sgugger would you be willing to have a look at the previous post? I think the issue is that ResBlock .children() returns too many items. I suspect itâs related to the way the model description prints out. In the above example, m[0][4][0]
is a ResBlock. But the printout shows this:
(convs)
⌠a bunch of ConvLayers
(convpath)
⌠the SAME ConvLayers (if I understand right)
(idpath)
⌠potentially a ConvLayer for the skip connection
I just canât figure out how to tell the bn_splitter function to ignore the items in convpath.
Thanks!
This was a problem in ResBlock indeed. Should be fixed now.
Excellent, thanks very much. Iâll give it a try tomorrow.
Does anyone know if load_learner
(the function for loading an exported model) got rolled into Learner.load
in fastai2, or does it have a different name now? I canât find it in the docs and the examples all stop short of re-loading an exported model.
Look at the lesson 2 notebooks under course. Itâs torch.load() now
Excellent, thanksâI had missed that. It certainly needs to be better documented.
BTW, I really enjoyed your âA walk with fastai2ââvery useful indeed.
Documentation is still being done since itâs a hard alpha at the moment
And thank you!
Understood! I didnât mean to sound critical; rather, to flag it as something that someone seeing the library for the first time (me) struggled with.
I just found my bug due to setting an attribute of the callback but i thought it was the learner. So self.bla
instead of self.learn.bla
. There is a note of this in the learner docs/nbs.
I have made a PR to add a warning if you set a attribute in your callback when there is an attribute in the learner with the same name.
Notebook 13:
tst_cb.call_me = lambda: print(âmaybeâ)
Love to see it
Awesome idea.
Hi everyone, Iâve been running through notebook 13 and had 2 quick questions. Would love it if somebody could help me out on either:
-
When we use the context manager added_cbs() at start of fit() and validate() in the Learner class, what is the reason for removing all callbacks from self.cbs? Is it just in case one of the Callbacks is a hook, and therefore needs to be removed to avoid unnecessary computation?
-
Just wondering the difference between the 2 arguments in the init() for Learner, the cbs and cb_funcs. When should I use one over the other?
Thanks