Any ideas how to translate the optimizer schedule in “Attention Is All You Need” to fastai?

Any ideas how to translate the optimizer schedule in “Attention Is All You Need” to fastai?

Here is the definition from the paper:

5.3 Optimizer
We used the Adam optimizer with β1 = 0.9, β2 = 0.98 and e= 10−9. We varied the learning rate over the course of training, according to the formula:

lrate = dmodel**−0.5 · min(step_num**−0.5, step_num · warmup_steps**−1.5)

This corresponds to increasing the learning rate linearly for the first warmup_steps training steps,
and decreasing it thereafter proportionally to the inverse square root of the step number. We used
warmup_steps = 4000.

Not sure how to take that formula and convert it into the appropriate call to schedule_hp.

1 Like

I am also facing the same issue. If you have figured out the solution, can you share it here?