I am intending to use the
fit_sgdr() method to train a Vision model, and initially, I have received significantly better performance using it over
My understanding is that it uses a similar method to using cyclical learning rates, but instead of using triangle-like learning rates, it uses cosine-annealing.
It starts with a high learning rate and decreases it over cycles in steps supplied to the
cycle_len parameter of the method. After the cycle finishes, it starts again with the high learning rate again.
It is suggested in the paper that it might yield better results if the upper and lower bounds are decreased progressively. Is it implemented by
By reading the source code, I think it is-
pcts = [cycle_len * cycle_mult**i / n_epoch for i in range(n_cycles)]
I am also new to schedules other than
fine_tune(). So feel free to correct me if I am wrong anywhere.
fine_tune() or using
fit_one_cycle() directly, we choose the point with the steepest gradient, i.e. where the loss has fallen most quickly. Should the
lr_max parameter be any different and should I choose the place where the loss starts to rise up again?