I have a few questions related to learning rate, its working in a Learner, and the max_lr
parameter in fit_one_cycle() :-
-
According to the docs, the default argument for
max_lr
infit_one_cycle()
isslice(None,0.03,None)
. So, if i passmax_lr = 0.02
, is that equivalent tomax_lr = slice(None, 0.02, None)
i.e., the learning rate is sliced from 0 to 0.02 between the beginning and final layers or simply0.02
ie all the layers have a learning rate of 0.02? -
When I pass
max_lr = slice(1e-5,1e-3)
, it means that the first layer is trained with alr=1e-5
, the final layer withlr=1e-3
, and the intermediate layers are trained with values in between1e-5
and1e-3
. However, in the plot generated bylearn.recorder.plot_lr()
, a single curve is present. Which lr is shown? Is it the lr of the last layer or first layer or average of all lrs? -
Is there any attribute that shows the learning rate of a Learner? The closest I could find was
learn.opt
which shows the optimizer, which in turn showed the individual learning rate of eachParameter Group
. Is there any other method to get the current learning rate(s) of a Learner?