What does the slice(lr) mean in fit_one_cycle()

In Lesson 3 - planet, I saw these 2 lines of code:

``````lr = 0.01
learn.fit_one_cycle(5, slice(lr))
``````

if the slice(min_lr, max_lr) then I understand the fit_one_cycle() will use the scattered Learning Rates from slice(min_lr, max_lr). (Hopefully, my understanding to this is correct)

But in this case slice(lr) only has one parameter,

What are the differences between fit_one_cycle(5, lr) and fit_one_cycle(5, slice(lr)) ?
And what are the benefits of using slice(lr) instead of lr directly?

2 Likes

With the former, every parameter group will use a learning rate of `lr`, whereas with the latter, the last parameter group will use a learning rate of `lr`, while the other groups will have `lr/10`.

4 Likes

thanks @immarried when you say the former, do you mean use slice(lr)?
the latter, do you mean using slice(min_lr, max_lr)?

New to the AI and Python world, missing a lot of concept.

E,g, layer_groups, is it epoch?

What is the syntax of

1. method(): return lr? => isinstace(ir,slice): return lr
2. if lr.start: res = xxxx

Yes, say you have 3 layer groups: group 1, 2 and 3. `max_lr=slice(1)` means that the learning rate for group 3 is 1, and 0.1 for groups 1 and 2. `max_lr=1` means the learning rate is 1 for groups 1, 2 and 3.

A modelâ€™s weights/parameters can be divided into different groups, called â€śparameter groupsâ€ť or â€ślayer groupsâ€ť. You can give each group a different learning rate, and then during training, parameters from different groups will be updated using these different learning rates.

6 Likes

How is it going to help us? I mean having different lr for different groups?
and beside that, doesnâ€™t slice in python work with this parameters? slice (start, stop, step)? i canâ€™t recognize these parameters in this line of code
`learn.fit_one_cycle(1, slice(5e-3/(2.6**4),5e-3), moms=(0.8,0.7))`

@ZahraHkh
Applying different lr for different groups is a technique called â€śdiscriminative layer trainingâ€ť that is introduced in part 1. This technique is commonly used in both computer vision and natural language processing. You can refer to this fastai doc for more details

I am not sure if I fully understand your second question. Did your question refer to why slice() could be passed 1 or 2 arguments only, while it in fact has 3 arguments?

slice() can be passed 1 or 2 arguments only. Below is a snippet of experiments for your reference:

``````In [9]: slice(5)
Out[9]: slice(None, 5, None)

In [10]: slice(1, 5)
Out[10]: slice(1, 5, None)
``````

Therefore, in your last line, `slice(5e-3/(2.6**4),5e-3)` is equivalent to `slice(start = 5e-3/(2.6**4), stop = 5e-3, step = None)`

I hope my address could help!

5 Likes

yes, that was what i meant.
Yes, it indeed helped me a lot.

1 Like

Thanks alot @riven314 For this answer on Discriminative Learning , This was unresolved doubt which helped me alot , Also Thanks to @ZahraHkh For bringing to notice doubt on step size.

1 Like

You are welcome

1 Like

Hi,

It cleared a lot of ambiguity. I am still stuck with using slice for max_lr and having two parameter.

learn.fit_one_cycle(20, max_lr=slice(1e-5,1e-4))

Thank you,

In this case, `slice(1e-5, 1e-4)` is essentially `slice(start = 1e-5, stop = 1e-4, step = None)`.

From the source code (fastai2), it triggers discriminative learning: `1e-5` will be the learning rate applied on the top layer, `1e-4` will be the learning rate applied on the bottom layer. The layers in between will have learning rate somewhere between `1e-5` and `1e-4` (geometrically progress).

See the definition of `set_hyper` for more details on how the learning rates distribute among layers: link

2 Likes

Hii Riven,
I am currently working on an image classification project for Melanoma classification. I am using FastAI. I have 3 different layer groups where the first two are of the densenet161 model and the last is a fully connected layer group.

In training, I first freeze the model and train it for a few epochs on a small learning rate (found by lr_find() function ). Then, I unfreeze the model and train it using the slice function. I am confused of which two learning rates to chose. Is there a way to approximate these rates using the lr_find() function?