I created a custom scheduler OneCycleLRWithPlateau . It adds a middle plateau phase to the OneCycleLR scheduler which allows you to maintain the max learning rate for a tunable percentage of the training cycle. It is defined in the snippet below:
OneCycleLRWithPlateau
class OneCycleLRWithPlateau(lr_scheduler.OneCycleLR):
def __init__(self, *args, pct_start=0.1, pct_plateau=0.2, **kwargs):
kwargs["pct_start"] = pct_start
super(OneCycleLRWithPlateau, self).__init__(*args, **kwargs)
self._schedule_phases = [
{
'end_step': float(pct_start * self.total_steps) - 1,
'start_lr': 'initial_lr',
'end_lr': 'max_lr',
'start_momentum': 'max_momentum',
'end_momentum': 'base_momentum',
},
{
'end_step': float((pct_start + pct_plateau) * self.total_steps) - 2,
'start_lr': 'max_lr',
'end_lr': 'max_lr',
'start_momentum': 'base_momentum',
'end_momentum': 'base_momentum',
},
{
'end_step': self.total_steps - 1,
'start_lr': 'max_lr',
'end_lr': 'initial_lr',
'start_momentum': 'base_momentum',
'end_momentum': 'max_momentum',
},
]
Using the configuration below increased the accuracy of @christopherthomas’s most recent submission from 94.5 => 94.7 over 5 epochs.
sched = partial(OneCycleLRWithPlateau, max_lr=0.01, total_steps=tmax, pct_start=0.1, pct_plateau=0.6)
reproducibility has been a challenge. Generally the configuration above will produce a pre-TTA accuracy of 94.3% but the post-TTA accuracy can fluctuate from 94.5% to 94.7%.