ssf
(sun shuo feng )
March 4, 2020, 4:35am
1
I’ve heard of the over9000 optimizer before, it’s a really powerful optimizer. However, it used the flat and anneal scheduler in the experiment, which I had not heard of, and I was not very familiar with fastai. So I can’t understand the code for the scheduler, can someone implement it in pytorch so I can get a better understanding of the scheduler?
Thanks!
muellerzr
(Zachary Mueller)
March 4, 2020, 4:37am
3
ssf:
However, it used the flat and anneal scheduler in the experiment, which I had not heard of, and I was not very familiar with fastai.
The callback itself we built is here:
https://github.com/fastai/fastai/blob/master/fastai/callbacks/flat_cos_anneal.py
Basically you stay at a constant high learning rate for ~72% of training, and the last ~18% follows a Cosine function as it slows down
A better visualization is the fastai2
implementation in the notebook:
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#export\n",
"from fastai2.basics import *"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#hide\n",
"from nbdev.showdoc import *"
This file has been truncated. show original
ssf
(sun shuo feng )
March 4, 2020, 4:57am
4
Thank you very much for your answer which help me a lot!!
ssf
(sun shuo feng )
March 4, 2020, 7:34am
5
Sorry, I have another question, in the last cosine annealing section, whether T_max is the steps of the last section
muellerzr
(Zachary Mueller)
March 4, 2020, 7:37am
6
I don’t quite follow here, which example and can you post the code you mention?
ssf
(sun shuo feng )
March 4, 2020, 7:51am
7
I’m sorry I didn’t express myself clearly.
phase1 = TrainingPhase(batch_finish).schedule_hp(‘lr’, lr, anneal=curve_type).schedule_hp(‘mom’, moms[1])
this part use the cos anneal to adjust lr.
I don’t know why I have this fancy idea, but I wonder if it’s right:
whether the T_max=batch_finish ?
or define T_max yourself