Why does ULMFiT use slanted triangular LR instead of other LR sched techs?

yann · November 9, 2018, 9:37am

I assume that STLR is used to jump out of local minima and to learn again. But there are other LR scheduling techs such as tf.train.cosine_decay_restarts. I also didn’t see any other papers applied with STLR. So what the difference would be if some other learning rate decay with restarting techs were applied?
Thanks.