Fastai v2 callbacks / learner / optimizer

I adjusted the EarlyStopping callback to instead run for all your epochs and then report back when your highest value was and what it was during training:

class TrackHighestEpoch(TrackerCallback):
    "A `TrackerCallback` that monitors training for improvements and reports back at the end"
    def __init__(self, monitor='accuracy', comp=None, min_delta=0., patience=1):
        super().__init__(monitor=monitor, comp=comp, min_delta=min_delta)
        self.patience = patience

    def begin_fit(self): self.highest_epoch = 0; super().begin_fit()
    def after_epoch(self):
        "Compare the value monitored to its best score and maybe stop training."
        super().after_epoch()
        if self.new_best: 
          self.highest_epoch = self.epoch
    def after_fit(self):
        "Print highest value"
        print(f'Highest accuracy of: {self.best} at epoch {self.highest_epoch+1}')

I plan on adding a gradient accumulation callback and accumulate the loss for n batches to add more stability when the model is too big to have a good batch size.

I looked at MixedPrecision as I think similar modifications may be needed (which will make it hard to keep both compatible).

My approach would be to call CancelBatchException at after_loss until I reach n epochs. Does it seem reasonable?

1 Like

That sounds reasonable to me.
Also, related but I’m planning to do a callback to implement a couple SGD variations (min k loss and ordered SGD) and it would be interesting to also test blending your gradient accumulation in!

Hey Zach, It has been a while since I have gotten to play with Fastai, but I am working on a project and would like to implement early stopping with recall as this is what our business is mostly focused on that or balanced accuracy

I have a lot of metrics because I just think it’s awesome to see them and how easy it is to just add them is great and I have only 3 epochs going, but all in the spirit of experimentation with Fastai. The default is valid_loss and I saw an article in that Jeremy says that this is the best way to go about it, but I would like to concentrate on the metric if I can.

Any thoughts on how to go about this

To give a little more clarity into why I want to move this direction is the following

Where we are seeing improvement in the Recall as the model is learning there are 14 million observations so I would expect there to be some gain with in the first 5 epochs