Fastai v2 callbacks / learner / optimizer

Wiki topic for the new callback and Learner/optimizer systems.

2 Likes

Hey Jeremy! Thanks for the new sub-forum and thank you and sgugger et al for all the work you have been doing for v2!

For those who do not know, a group of us on the forum recently worked to trying to find the state of the art for imagewoof, and we implemented a number of ideas. One of which includes what we call fit_fc, Flat Cosine Annealing.

How different is the new callback system from before? And what are the major changes we should expect? We’d like to get this implemented for both libraries due to how well it works with the new optimizers (Ranger, RAdam, etc)

For the fit in question, it is here:

class FlatCosAnnealScheduler(LearnerCallback):
    """
    Manage FCFit training as found in the ImageNette experiments. 
    Code format is based on OneCycleScheduler
    """
    def __init__(self, learn:Learner, lr:float=4e-3, moms:Floats=(0.95,0.999),
               start_pct:float=0.72, start_epoch:int=None,curve='cosine',
               tot_epochs:int=2):
        super().__init__(learn)
        n = len(learn.data.train_dl)
        self.anneal_start = int(n*tot_epochs*start_pct)
        self.batch_finish = (n*tot_epochs - self.anneal_start)
        phase0 = TrainingPhase(self.anneal_start).schedule_hp('lr', lr)
        if curve=="cosine":
            curve_type=annealing_cos
        elif curve=="linear":
            curve_type=annealing_linear
        elif curve=="exponential":
            curve_type=annealing_exp
        else:
            raiseValueError(f"annealing type not supported {curve}")
        phase1 = TrainingPhase(self.batch_finish).schedule_hp('lr', lr, anneal=curve_type)
        phases = [phase0, phase1]
        self.phases,self.start_epoch = phases,start_epoch


    def on_train_begin(self, epoch:int, **kwargs:Any)->None:
        "Initialize the schedulers for training."
        res = {'epoch':self.start_epoch} if self.start_epoch is not None else None
        self.start_epoch = ifnone(self.start_epoch, epoch)
        self.scheds = [p.scheds for p in self.phases]
        self.opt = self.learn.opt
        for k,v in self.scheds[0].items(): 
            v.restart()
            self.opt.set_stat(k, v.start)
        self.idx_s = 0
        return res
    
    def jump_to_epoch(self, epoch:int)->None:
        for _ in range(len(self.learn.data.train_dl) * epoch):
            self.on_batch_end(True)

    def on_batch_end(self, train, **kwargs:Any)->None:
        "Take a step in lr,mom sched, start next stepper when the current one is complete."
        if train:
            if self.idx_s >= len(self.scheds): return {'stop_training': True, 'stop_epoch': True}
            sched = self.scheds[self.idx_s]
            for k,v in sched.items(): self.opt.set_stat(k, v.step())
            if list(sched.values())[0].is_done: self.idx_s += 1

Thanks for the question @muellerzr. The new callback system is extremely close to what we built in the latest “from the foundations” course. So have a look at that, then look at the new repo, and let us know if you have any questions. There are new annealing approaches that should make your callback way simpler, I think.

3 Likes

I’ve been playing around with the callback system and found that visualizing the callbacks in the training loop is helpful for understanding training in fastai in the same way that learn.summary() is helpful for understanding the model.

Do you think this would be helpful in the library?

learn.show_training_loop()

Output:

Start Fit
   - begin_fit      : [TrainEvalCallback, Recorder, ProgressCallback, MixedPrecision]
  Start Epoch Loop
     - begin_epoch    : [Recorder, ProgressCallback]
    Start Train
       - begin_train    : [TrainEvalCallback, Recorder, ProgressCallback]
      Start Batch Loop
         - begin_batch    : [MixedPrecision]
         - after_pred     : [MixedPrecision]
         - after_loss     : [MixedPrecision]
         - after_backward : [MixedPrecision]
         - after_step     : [MixedPrecision]
         - after_cancel_batch: []
         - after_batch    : [TrainEvalCallback, Recorder, ProgressCallback]
      End Batch Loop
    End Train
     - after_cancel_train: [Recorder]
     - after_train    : [Recorder, ProgressCallback]
    Start Valid
       - begin_validate : [TrainEvalCallback, Recorder, ProgressCallback]
      Start Batch Loop
         - **CBs same as train batch**: []
      End Batch Loop
    End Valid
     - after_cancel_validate: [Recorder]
     - after_validate : [Recorder, ProgressCallback]
  End Epoch Loop
   - after_cancel_epoch: []
   - after_epoch    : [Recorder]
End Fit
 - after_cancel_fit: []
 - after_fit      : [ProgressCallback, MixedPrecision]

Code for generating the snippet above:

def get_ordered_cbs(cbs, step):
    return [cb for cb in sort_by_run(cbs) if hasattr(cb, step)]

@patch
def show_training_loop(learn:Learner):
    loop = ['Start Fit', 'begin_fit', 'Start Epoch Loop', 'begin_epoch',
            'Start Train', 'begin_train', 'Start Batch Loop', 'begin_batch', 
            'after_pred', 'after_loss', 'after_backward', 'after_step', 
            'after_cancel_batch', 'after_batch','End Batch Loop',
            'End Train', 'after_cancel_train', 'after_train',
            'Start Valid', 'begin_validate',
            'Start Batch Loop', '**CBs same as train batch**', 'End Batch Loop',
            'End Valid', 'after_cancel_validate', 'after_validate',
            'End Epoch Loop', 'after_cancel_epoch', 'after_epoch', 'End Fit', 'after_cancel_fit', 'after_fit']
    indent = 0
    for s in loop:
        if s.startswith('Start'): print(f'{" "*indent}{s}'); indent += 2
        elif s.startswith('End'): indent -= 2; print(f'{" "*indent}{s}')
        else: print(f'{" "*indent} - {s:15}:', get_ordered_cbs(learn.cbs, s))
10 Likes

This looks lovely, can you make a PR to add it in the Learner notebook?

Thanks! Done :slight_smile:

@sgugger I finished porting over the CosineAnnealing scheduler for the new optimizers. When I’m exporting the notebooks to use in the library how should I go about this? Specifically because there’s the callback along with fit_fc. Should I do a 16a_callback_cos_anneal notebook and have both? Or what would you recommend.

Thanks!

The code in question:

#export
def FlatCosAnnealScheduler(self:Learner, lr:float=4e-3, tot_epochs:int=1, start_pct:float=0.72,
                           curve='cosine'):
  n = len(self.dbunch.train_dl)
  anneal_start = int(n * tot_epochs * start_pct)
  batch_finish = ((n * tot_epochs) - anneal_start)
  if curve=="cosine":        curve_sched=SchedCos(lr, 0)
  elif curve=="linear":      curve_sched=SchedLin(lr, 0)
  elif curve=="exponential": curve_sched=SchedExp(lr, 0)
  else: raiseValueError(f"annealing type not supported {curve}")
  scheds = [SchedNo(lr, start_pct), curve_sched]
  scheds = {'lr': combine_scheds([start_pct, 1-start_pct], scheds)}
  return scheds

def fit_fc(learn, tot_epochs:int=1, lr:float=1e-2, start_pct:float=0.72):
    "Fit a model with Flat Cosine Annealing"
    max_lr = learn.lr
    callbacks = ParamScheduler(FlatCosAnnealScheduler(learn, lr, start_pct=start_pct, tot_epochs=tot_epochs))
    learn.fit(tot_epochs, max_lr, cbs=callbacks)

In 2019 Part 2, notebook 11a_transfer_learning, we created a splitter function that was used to split the parameters of a model into groups that could be trained at different rates.

In fastai2, the splitter function fails a little test with an XResnet model I was using because the number of parameters in the two groups don’t add up to the total number of parameters in the model.

I’ve dug into it and it appears that the parameters are double-counted within a ResBlock (at least, using l.children() in the splitter function).

Here’s the splitter function:

def bn_splitter(m):
    def _bn_splitter(l, g1, g2):
        if isinstance(l, nn.BatchNorm2d): g2 += l.parameters()
        elif hasattr(l, 'weight'): g1 += l.parameters()
        for ll in l.children(): _bn_splitter(ll, g1, g2)
        
    g1,g2 = [],[]
    _bn_splitter(m[0], g1, g2)
    
    g2 += m[1:].parameters()
    return g1,g2

and you use it like this:

m = learn.model  
g1,g2 = bn_splitter(m)

but this test fails with my xresnet50:

test_eq(len(g1)+len(g2), len(list(m.parameters())))

The body of the model (m[0]) has 8 parts. Here are their parameter counts:

print("LayerType, n_parameters")
for l in m[0].children(): print(type(l),len(list(l.parameters())))

LayerType, n_parameters
<class ‘fastai2.layers.ConvLayer’> 3
<class ‘fastai2.layers.ConvLayer’> 3
<class ‘fastai2.layers.ConvLayer’> 3
<class ‘torch.nn.modules.pooling.MaxPool2d’> 0
<class ‘torch.nn.modules.container.Sequential’> 30
<class ‘torch.nn.modules.container.Sequential’> 39
<class ‘torch.nn.modules.container.Sequential’> 57
<class ‘torch.nn.modules.container.Sequential’> 30

The parameter count for the parts agrees with the parameter count for the whole body:
3+3+3+30+39+57+30, len(list(m[0].parameters()))
165, 165

Now look at item 4, the first Sequential object, which has 30 parameters:

  • The first item is a ResBlock consisting of 3 ConvLayers plus a skip connection.
  • Blocks 2 and 3 don’t have the skip connection.
  • Each conv layer has 3 parameters (2 for the batchnorm + 1 for the conv)

So the parameter counts for item 4 appear to be correct:

for l in m[0][4].children(): print(type(l),len(list(l.parameters())))

<class ‘fastai2.layers.ResBlock’> 12
<class ‘fastai2.layers.ResBlock’> 9
<class ‘fastai2.layers.ResBlock’> 9

But if we zoom in one more level, the parameter count for the first ResBlock is wrong: it seems to be double-counting the ConvLayers (if I’ve interpreted it correctly):

for l in m[0][4][0].children(): print(type(l),len(list(l.parameters())))

<class ‘torch.nn.modules.container.Sequential’> 9
<class ‘torch.nn.modules.container.Sequential’> 9 #this one is double-counted
<class ‘torch.nn.modules.container.Sequential’> 3
<class ‘torch.nn.modules.activation.ReLU’> 0

The bn_splitter function seems to be using these double-counts, because it comes up with a number that is much too high.

I don’t see exactly how to fix this. I’d appreciate another set of eyes on it. Thanks! (and apologies for my lack of skill in formatting).

@sgugger would you be willing to have a look at the previous post? I think the issue is that ResBlock .children() returns too many items. I suspect it’s related to the way the model description prints out. In the above example, m[0][4][0] is a ResBlock. But the printout shows this:
(convs)
… a bunch of ConvLayers
(convpath)
… the SAME ConvLayers (if I understand right)
(idpath)
… potentially a ConvLayer for the skip connection

I just can’t figure out how to tell the bn_splitter function to ignore the items in convpath.

Thanks!

This was a problem in ResBlock indeed. Should be fixed now.

Excellent, thanks very much. I’ll give it a try tomorrow.

Does anyone know if load_learner (the function for loading an exported model) got rolled into Learner.load in fastai2, or does it have a different name now? I can’t find it in the docs and the examples all stop short of re-loading an exported model.

Look at the lesson 2 notebooks under course. It’s torch.load() now

Excellent, thanks–I had missed that. It certainly needs to be better documented.

BTW, I really enjoyed your “A walk with fastai2”–very useful indeed.

1 Like

Documentation is still being done since it’s a hard alpha at the moment :wink:

And thank you!

Understood! I didn’t mean to sound critical; rather, to flag it as something that someone seeing the library for the first time (me) struggled with.

I just found my bug due to setting an attribute of the callback but i thought it was the learner. So self.bla instead of self.learn.bla. There is a note of this in the learner docs/nbs.

I have made a PR to add a warning if you set a attribute in your callback when there is an attribute in the learner with the same name.

1 Like

Notebook 13:
tst_cb.call_me = lambda: print(“maybe”)

Love to see it

Awesome idea.

Hi everyone, I’ve been running through notebook 13 and had 2 quick questions. Would love it if somebody could help me out on either:

  1. When we use the context manager added_cbs() at start of fit() and validate() in the Learner class, what is the reason for removing all callbacks from self.cbs? Is it just in case one of the Callbacks is a hook, and therefore needs to be removed to avoid unnecessary computation?

  2. Just wondering the difference between the 2 arguments in the init() for Learner, the cbs and cb_funcs. When should I use one over the other?

Thanks