Cyclic learning policies

ilovescience · January 5, 2019, 10:41pm

I am a fast.ai student currently working on Lesson 3 of Part 1, but I am interested in getting my hands dirty and work on a Kaggle competition, the Human Protein Atlas competition. Looking at this kernel I saw the use_clr argument. It looks like it is for a CyclicLR class, which implements are triangular policy for cyclic learning rate. When is this better than using cosine annealing for cyclic learning rates? If this is addressed in later lessons, could you please let me know which one?

(Also, this is my first post here, if this is the incorrect place, please let me know and I will move it to a different category. I had posted under Part 1 but got no response, so I assume this was probably covered under Part 2)

orange_runner · January 6, 2019, 4:06am

I would recommend reading Leslie Smith’s papers on this topic - fit_one_cylcle() is based on his work.

Super-Convergence: Very Fast Training of Neural Networks Using Large Learning Rates
Cyclical Learning Rates for Training Neural Networks
This one is VERY good (most recent pulblication): A DISCIPLINED APPROACH TO NEURAL NETWORK HYPER - PARAMETERS : P ART 1 – LEARNING RATE , BATCH SIZE , MOMENTUM , AND WEIGHT DECAY