Sometimes I see people use both sliced learning rate (slice(lr)
) and a single learning rate when working with frozen models. While using sliced learning rate for unfrozen models make sense, what does it mean to use sliced learning rates for frozen models? In addition, when should one use sliced learning rates for frozen models as opposed to a single value for the learning rate? Does one way lead to consistently better results than the other?
1 Like
From docs:
If you pass just
slice(end)
then the last group’s learning rate isend
, and all the other groups areend/10
. For instance (for our learner that has 3 layer groups):
1 Like
I don’t think it will change anything. Different learning rates are assigned to different layer groups, so if all the layer groups are freezed except the last one then using lr
or slice(lr)
won’t make any difference.