Null result: Adding 'spikes' to one_cycle LR sched

Jeremy notes that Imagenette was created “…to quickly see if my algorithm ideas might have a chance of working. They normally don’t…”

Thought I’d share one such idea that doesn’t work (as far as I can tell): adding a large-amplitude, high-frequency oscillation on top of the one_cycle schedule. So far, nothing I’ve tried with this idea seems to ‘help’ (or ‘hurt’).

The motivation was, given that a too-large learning rate can lead to poor convergence and even divergent behavior if it’s kept going for long, what if there are only quick “bursts” where the learning rate briefly exceeds its (‘recommended’) max value?

Programmatically, this looked like (fastai2 code)…

@annealer
def SchedCosSpike(start, end, pos, amp=0.3, freq=50, offset=.8): 
    return (start + (1 + math.cos(math.pi*(1-pos))) * (end-start) / 2) * (1+amp*(math.sin(freq*math.pi*(1-pos))+offset))

def combined_cos_spike(pct, start, middle, end):
    "Return a scheduler with cosine annealing and spikes from `start`→`middle` & `middle`→`end`"
    return combine_scheds([pct,1-pct], [SchedCos(start, middle), SchedCosSpike(middle, end)])

def godziLR(self:Learner, n_epoch, lr_max=None, div=25., div_final=1e5, pct_start=0.25, wd=defaults.wd,
                  moms=(0.95, 0.35, 0.95), cbs=None, reset_opt=False):
    "Fit `self.model` for `n_epoch` using the 1cycle policy."
    if self.opt is None: self.create_opt()
    self.opt.set_hyper('lr', self.lr if lr_max is None else lr_max)
    lr_max = np.array([h['lr'] for h in self.opt.hypers])
    scheds = {'lr': combined_cos_spike(pct_start, lr_max/div, lr_max, lr_max/div_final),
              'mom': combined_cos(pct_start, *(self.moms if moms is None else moms))}
    self.fit(n_epoch, cbs=ParamScheduler(scheds)+L(cbs), reset_opt=reset_opt, wd=wd)

The name godziLR was a visual pun at how the LR schedule looks like when graphed (depending on how you choose amp, offset and freq), one example being like this:

Initially I tried leaving the momentum the same as in fit_one_cycle, then later tried (as in the code above) dropping the momentum way down (so that we don’t ‘shoot’ the optimizer off on some fast trajectory). I tried varying the amp , offset and freq parameters quite a bit.

…So far nothing has produced any appreciable changes to the final accuracy (compared to the default LR schedule), using models such as xresnet18 up through xse_resnext101, for 5 to 20 epochs.

This is only 24 hours’ work, and its motivation may have been naive, but I often like to answer questions of “would this work?” by actually trying it. Nice to have @jeremy’s Imagenette to try ideas out on!

If anyone has any thoughts on this – whether theoretically why this was a waste of time, or ideas that might actually improve on this – let me know!

2 Likes

Have you tried it with Ranger? When we train with it, we keep the learning rate extremely high for a long period of time, I’m curious how it would react with your schedule. I’d suggest trying ImageWoof :slight_smile:

1 Like

Thanks! Yes, I’ve been using ranger in my tests. ImageWoof will be next on my list.

Actually, fit_flat_cos outperforms my idea and the default in my tests so far. A few minutes ago I convinced myself that this was because Imagenette is a subset of Imagenet – where the initial weights come from – and thus there is no need for a ‘ramp up’ period at the start of the schedule.

1 Like

When training, it’s recommended not using a pretrained model (for ImageNette and Woof) This way we can see the result without this :slight_smile: (and what the leaderboard requires)

1 Like

@muellerzr Yes, thanks for reminder! Earlier I was looking up how to initialize with random weights instead of the pre-trained weights, but I couldn’t figure out how to do it…and now looking again I see: It’s as simple as setting pretrained = False when I define the Learner, isn’t it?

Yes, that’s exactly it :slight_smile:

2 Likes

Follow-up: I tried adding the ‘spikes’ to the fit_flat_cos LR schedule for the notebook at the top of the leaderboard for ImageWoof for 5 epochs and was not able gain any improvement to the results, for the ranges of parameters I tried.

I was able to make things a little bit worse though, so at least my idea has some effect. :sweat_smile:

1 Like