Flat and anneal scheduler

I’ve heard of the over9000 optimizer before, it’s a really powerful optimizer. However, it used the flat and anneal scheduler in the experiment, which I had not heard of, and I was not very familiar with fastai. So I can’t understand the code for the scheduler, can someone implement it in pytorch so I can get a better understanding of the scheduler?


The callback itself we built is here:

Basically you stay at a constant high learning rate for ~72% of training, and the last ~18% follows a Cosine function as it slows down

A better visualization is the fastai2 implementation in the notebook:

Thank you very much for your answer which help me a lot!!

Sorry, I have another question, in the last cosine annealing section, whether T_max is the steps of the last section

I don’t quite follow here, which example and can you post the code you mention?

I’m sorry I didn’t express myself clearly.

phase1 = TrainingPhase(batch_finish).schedule_hp(‘lr’, lr, anneal=curve_type).schedule_hp(‘mom’, moms[1])

this part use the cos anneal to adjust lr.


I don’t know why I have this fancy idea, but I wonder if it’s right:

whether the T_max=batch_finish ?
or define T_max yourself