Yes, say you have 3 layer groups: group 1, 2 and 3. max_lr=slice(1) means that the learning rate for group 3 is 1, and 0.1 for groups 1 and 2. max_lr=1 means the learning rate is 1 for groups 1, 2 and 3.
A model’s weights/parameters can be divided into different groups, called “parameter groups” or “layer groups”. You can give each group a different learning rate, and then during training, parameters from different groups will be updated using these different learning rates.
How is it going to help us? I mean having different lr for different groups?
and beside that, doesn’t slice in python work with this parameters? slice (start, stop, step)? i can’t recognize these parameters in this line of code learn.fit_one_cycle(1, slice(5e-3/(2.6**4),5e-3), moms=(0.8,0.7))
Applying different lr for different groups is a technique called “discriminative layer training” that is introduced in part 1. This technique is commonly used in both computer vision and natural language processing. You can refer to this fastai doc for more details
I am not sure if I fully understand your second question. Did your question refer to why slice() could be passed 1 or 2 arguments only, while it in fact has 3 arguments?
slice() can be passed 1 or 2 arguments only. Below is a snippet of experiments for your reference:
In : slice(5)
Out: slice(None, 5, None)
In : slice(1, 5)
Out: slice(1, 5, None)
Therefore, in your last line, slice(5e-3/(2.6**4),5e-3) is equivalent to slice(start = 5e-3/(2.6**4), stop = 5e-3, step = None)
In this case, slice(1e-5, 1e-4) is essentially slice(start = 1e-5, stop = 1e-4, step = None).
From the source code (fastai2), it triggers discriminative learning: 1e-5 will be the learning rate applied on the top layer, 1e-4 will be the learning rate applied on the bottom layer. The layers in between will have learning rate somewhere between 1e-5 and 1e-4 (geometrically progress).
See the definition of set_hyper for more details on how the learning rates distribute among layers: link
I am currently working on an image classification project for Melanoma classification. I am using FastAI. I have 3 different layer groups where the first two are of the densenet161 model and the last is a fully connected layer group.
In training, I first freeze the model and train it for a few epochs on a small learning rate (found by lr_find() function ). Then, I unfreeze the model and train it using the slice function. I am confused of which two learning rates to chose. Is there a way to approximate these rates using the lr_find() function?