A challenge for you all

tommyc · January 16, 2023, 6:03pm

I created a custom scheduler OneCycleLRWithPlateau . It adds a middle plateau phase to the OneCycleLR scheduler which allows you to maintain the max learning rate for a tunable percentage of the training cycle. It is defined in the snippet below:

OneCycleLRWithPlateau

class OneCycleLRWithPlateau(lr_scheduler.OneCycleLR):
    def __init__(self, *args, pct_start=0.1, pct_plateau=0.2, **kwargs):
        kwargs["pct_start"] = pct_start
        super(OneCycleLRWithPlateau, self).__init__(*args, **kwargs)
        self._schedule_phases = [
            {
                'end_step': float(pct_start * self.total_steps) - 1,
                'start_lr': 'initial_lr',
                'end_lr': 'max_lr',
                'start_momentum': 'max_momentum',
                'end_momentum': 'base_momentum',
            },
            {
                'end_step': float((pct_start + pct_plateau) * self.total_steps) - 2,
                'start_lr': 'max_lr',
                'end_lr': 'max_lr',
                'start_momentum': 'base_momentum',
                'end_momentum': 'base_momentum',
            },
            {
                'end_step': self.total_steps - 1,
                'start_lr': 'max_lr',
                'end_lr': 'initial_lr',
                'start_momentum': 'base_momentum',
                'end_momentum': 'max_momentum',
            },
        ]

Using the configuration below increased the accuracy of @christopherthomas’s most recent submission from 94.5 => 94.7 over 5 epochs.

sched = partial(OneCycleLRWithPlateau, max_lr=0.01, total_steps=tmax, pct_start=0.1, pct_plateau=0.6)

reproducibility has been a challenge. Generally the configuration above will produce a pre-TTA accuracy of 94.3% but the post-TTA accuracy can fluctuate from 94.5% to 94.7%.