One Cycle LR and Adam

Hi! I have read some blog posts stating that CLR is best with SGD. This blog post 1 also stated that Adam and CLR cannot be used simultaneously.

However, is it possible to combine the One Cycle LR scheduler with the Adam optimizer? I tried and it seems to work. I just don’t clearly understand why it works when the blog post said it will not. If someone could shed a light on this matter, it would be deeply appreciated.

Thanks and have a productive 2022 ahead!

It’s actually… exactly wrong. The default is Adam + OneCycle, and that’s what is taught in the courses as well.

1 Like

I see. Yes, I do realize that. So does that mean OneCyle LR can work with almost any optimizers, correct? Also, does that mean the blog post is misleading?

Thanks for the reply, by the way.

Correct. Specifically you need to see if the warm up (first half of one cycle) is baked into the optimizer. If not, then yup. If so (such as Ranger), then you use fit_flat_cos (or cosine annealing) with the annealing starting around 3/4 of the way there.

And yes, the article is misleading.

(What it says about cosine annealing isn’t wrong though, recently it’s become popular to use with Adam as well)

Small bit from the author, they’re updating the post to fix this as it is indeed the same. Here’s also a pytorch issue related to it stating this as well: OneCycleLr : which optimizer · Issue #36133 · pytorch/pytorch · GitHub