Cycle learning rates are pretty widely used in the course. However, I have yet to find any kind of principled approach to choosing the hyper-parameters.
lr_find provides the maximum learning rate to choose but the other parameters. For example imdb notebook in dl2 uses
use_clr=(32,10) at one place while
use_clr=(20, 10) at other and I am having trouble deciding these parameters. The first parameter gives the lowest learning rate, and the second is the fraction used for growth and descent of the cycle.
Any comments on choosing these parameters is appreciated.