In Lesson 3 - planet, I saw these 2 lines of code:

lr = 0.01
learn.fit_one_cycle(5, slice(lr))

if the slice(min_lr, max_lr) then I understand the fit_one_cycle() will use the scattered Learning Rates from slice(min_lr, max_lr). (Hopefully, my understanding to this is correct)

But in this case slice(lr) only has one parameter,

What are the differences between fit_one_cycle(5, lr) and fit_one_cycle(5, slice(lr)) ?
And what are the benefits of using slice(lr) instead of lr directly?

With the former, every parameter group will use a learning rate of lr, whereas with the latter, the last parameter group will use a learning rate of lr, while the other groups will have lr/10.

Yes, say you have 3 layer groups: group 1, 2 and 3. max_lr=slice(1) means that the learning rate for group 3 is 1, and 0.1 for groups 1 and 2. max_lr=1 means the learning rate is 1 for groups 1, 2 and 3.

A modelâ€™s weights/parameters can be divided into different groups, called â€śparameter groupsâ€ť or â€ślayer groupsâ€ť. You can give each group a different learning rate, and then during training, parameters from different groups will be updated using these different learning rates.

How is it going to help us? I mean having different lr for different groups?
and beside that, doesnâ€™t slice in python work with this parameters? slice (start, stop, step)? i canâ€™t recognize these parameters in this line of code learn.fit_one_cycle(1, slice(5e-3/(2.6**4),5e-3), moms=(0.8,0.7))

@ZahraHkh
Applying different lr for different groups is a technique called â€śdiscriminative layer trainingâ€ť that is introduced in part 1. This technique is commonly used in both computer vision and natural language processing. You can refer to this fastai doc for more details

I am not sure if I fully understand your second question. Did your question refer to why slice() could be passed 1 or 2 arguments only, while it in fact has 3 arguments?

slice() can be passed 1 or 2 arguments only. Below is a snippet of experiments for your reference:

In [9]: slice(5)
Out[9]: slice(None, 5, None)
In [10]: slice(1, 5)
Out[10]: slice(1, 5, None)

Therefore, in your last line, slice(5e-3/(2.6**4),5e-3) is equivalent to slice(start = 5e-3/(2.6**4), stop = 5e-3, step = None)

Thanks alot @riven314 For this answer on Discriminative Learning , This was unresolved doubt which helped me alot , Also Thanks to @ZahraHkh For bringing to notice doubt on step size.

In this case, slice(1e-5, 1e-4) is essentially slice(start = 1e-5, stop = 1e-4, step = None).

From the source code (fastai2), it triggers discriminative learning: 1e-5 will be the learning rate applied on the top layer, 1e-4 will be the learning rate applied on the bottom layer. The layers in between will have learning rate somewhere between 1e-5 and 1e-4 (geometrically progress).

See the definition of set_hyper for more details on how the learning rates distribute among layers: link

Hii Riven,
I am currently working on an image classification project for Melanoma classification. I am using FastAI. I have 3 different layer groups where the first two are of the densenet161 model and the last is a fully connected layer group.

In training, I first freeze the model and train it for a few epochs on a small learning rate (found by lr_find() function ). Then, I unfreeze the model and train it using the slice function. I am confused of which two learning rates to chose. Is there a way to approximate these rates using the lr_find() function?