ssf
(sun shuo feng )
March 4, 2020, 4:35am
1
I’ve heard of the over9000 optimizer before, it’s a really powerful optimizer. However, it used the flat and anneal scheduler in the experiment, which I had not heard of, and I was not very familiar with fastai. So I can’t understand the code for the scheduler, can someone implement it in pytorch so I can get a better understanding of the scheduler?
Thanks!
muellerzr
(Zachary Mueller)
March 4, 2020, 4:37am
3
ssf:
However, it used the flat and anneal scheduler in the experiment, which I had not heard of, and I was not very familiar with fastai.
The callback itself we built is here:
"Supports flat-cosine-annealling style training"
from ..core import *
from ..callback import *
from ..callbacks import *
from ..basic_train import Learner, LearnerCallback
__all__ = ['FlatCosAnnealScheduler']
# A new scheduler by Mikhail Grankin aimed for use of the new optimizers
def FlatCosAnnealScheduler(learn, lr:float=4e-3, tot_epochs:int=1, moms:Floats=(0.95,0.999),
start_pct:float=0.72, curve='cosine'):
"Manage FCFit trainnig as found in the ImageNette experiments"
n = len(learn.data.train_dl)
anneal_start = int(n * tot_epochs * start_pct)
batch_finish = ((n * tot_epochs) - anneal_start)
if curve=="cosine": curve_type=annealing_cos
elif curve=="linear": curve_type=annealing_linear
elif curve=="exponential": curve_type=annealing_exp
This file has been truncated. show original
Basically you stay at a constant high learning rate for ~72% of training, and the last ~18% follows a Cosine function as it slows down
A better visualization is the fastai2
implementation in the notebook:
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#export\n",
"from fastai2.basics import *"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from nbdev.showdoc import *"
]
This file has been truncated. show original
ssf
(sun shuo feng )
March 4, 2020, 4:57am
4
Thank you very much for your answer which help me a lot!!
ssf
(sun shuo feng )
March 4, 2020, 7:34am
5
Sorry, I have another question, in the last cosine annealing section, whether T_max is the steps of the last section
muellerzr
(Zachary Mueller)
March 4, 2020, 7:37am
6
I don’t quite follow here, which example and can you post the code you mention?
ssf
(sun shuo feng )
March 4, 2020, 7:51am
7
I’m sorry I didn’t express myself clearly.
phase1 = TrainingPhase(batch_finish).schedule_hp(‘lr’, lr, anneal=curve_type).schedule_hp(‘mom’, moms[1])
this part use the cos anneal to adjust lr.
I don’t know why I have this fancy idea, but I wonder if it’s right:
whether the T_max=batch_finish ?
or define T_max yourself