# Different implementations of differential learning rate (discriminative fine-tuning)

(Alex Lee) #1

In the course we pass an array of learning rates to fine tune layers of different levels, but in Howard and Ruder (2018) it says

Is this newly employed method preferable to the former? Also, \eta^{l} is the learning rate we found by using lr_find() right?