Different implementations of differential learning rate (discriminative fine-tuning)

In the course we pass an array of learning rates to fine tune layers of different levels, but in Howard and Ruder (2018) it says


Is this newly employed method preferable to the former? Also, \eta^{l} is the learning rate we found by using lr_find() right?