What does the slice(lr) mean in fit_one_cycle()

franva · December 31, 2019, 1:32am

In Lesson 3 - planet, I saw these 2 lines of code:

lr = 0.01
learn.fit_one_cycle(5, slice(lr))

if the slice(min_lr, max_lr) then I understand the fit_one_cycle() will use the scattered Learning Rates from slice(min_lr, max_lr). (Hopefully, my understanding to this is correct)

But in this case slice(lr) only has one parameter,

What are the differences between fit_one_cycle(5, lr) and fit_one_cycle(5, slice(lr)) ?
And what are the benefits of using slice(lr) instead of lr directly?

immarried · December 31, 2019, 5:36am

github.com

fastai/fastai/blob/master/fastai/basic_train.py#L185




def _test_writeable_path(self):
    path = self.path/self.model_dir
    try:
        path.mkdir(parents=True, exist_ok=True)
        tmp_file = get_tmp_file(path)
    except OSError as e:
        raise Exception(f"{e}\nCan't write to '{path}', set `learn.model_dir` attribute in Learner to a full libpath path that is writable") from None
    os.remove(tmp_file)


def lr_range(self, lr:Union[float,slice])->np.ndarray:
    "Build differential learning rates from `lr`."
    if not isinstance(lr,slice): return lr
    if lr.start: res = even_mults(lr.start, lr.stop, len(self.layer_groups))
    else: res = [lr.stop/10]*(len(self.layer_groups)-1) + [lr.stop]
    return np.array(res)


def fit(self, epochs:int, lr:Union[Floats,slice]=defaults.lr,
        wd:Floats=None, callbacks:Collection[Callback]=None)->None:
    "Fit the model on this learner with `lr` learning rate, `wd` weight decay for `epochs` with `callbacks`."
    lr = self.lr_range(lr)

With the former, every parameter group will use a learning rate of lr, whereas with the latter, the last parameter group will use a learning rate of lr, while the other groups will have lr/10.

franva · December 31, 2019, 5:45am

thanks @immarried when you say the former, do you mean use slice(lr)?
the latter, do you mean using slice(min_lr, max_lr)?

New to the AI and Python world, missing a lot of concept.

E,g, layer_groups, is it epoch?

What is the syntax of

method(): return lr? => isinstace(ir,slice): return lr
if lr.start: res = xxxx

immarried · December 31, 2019, 5:56am

Yes, say you have 3 layer groups: group 1, 2 and 3. max_lr=slice(1) means that the learning rate for group 3 is 1, and 0.1 for groups 1 and 2. max_lr=1 means the learning rate is 1 for groups 1, 2 and 3.

A model’s weights/parameters can be divided into different groups, called “parameter groups” or “layer groups”. You can give each group a different learning rate, and then during training, parameters from different groups will be updated using these different learning rates.

ZahraHkh · February 13, 2020, 7:32am

How is it going to help us? I mean having different lr for different groups?
and beside that, doesn’t slice in python work with this parameters? slice (start, stop, step)? i can’t recognize these parameters in this line of code
learn.fit_one_cycle(1, slice(5e-3/(2.6**4),5e-3), moms=(0.8,0.7))

riven314 · February 14, 2020, 4:02pm

@ZahraHkh
Applying different lr for different groups is a technique called “discriminative layer training” that is introduced in part 1. This technique is commonly used in both computer vision and natural language processing. You can refer to this fastai doc for more details

I am not sure if I fully understand your second question. Did your question refer to why slice() could be passed 1 or 2 arguments only, while it in fact has 3 arguments?

slice() can be passed 1 or 2 arguments only. Below is a snippet of experiments for your reference:

In [9]: slice(5)
Out[9]: slice(None, 5, None)

In [10]: slice(1, 5)
Out[10]: slice(1, 5, None)

Therefore, in your last line, slice(5e-3/(2.6**4),5e-3) is equivalent to slice(start = 5e-3/(2.6**4), stop = 5e-3, step = None)

I hope my address could help!

ZahraHkh · February 15, 2020, 8:14pm

Thank You so much for your complete and perfect answer.
yes, that was what i meant.
Yes, it indeed helped me a lot.

Athar1996 · May 7, 2020, 9:06am

Thanks alot @riven314 For this answer on Discriminative Learning , This was unresolved doubt which helped me alot , Also Thanks to @ZahraHkh For bringing to notice doubt on step size.

riven314 · May 7, 2020, 10:17am

You are welcome

AvidTeacher · July 15, 2020, 7:43pm

Hi,

It cleared a lot of ambiguity. I am still stuck with using slice for max_lr and having two parameter.

learn.fit_one_cycle(20, max_lr=slice(1e-5,1e-4))

Thank you,

riven314 · July 16, 2020, 6:50am

In this case, slice(1e-5, 1e-4) is essentially slice(start = 1e-5, stop = 1e-4, step = None).

From the source code (fastai2), it triggers discriminative learning: 1e-5 will be the learning rate applied on the top layer, 1e-4 will be the learning rate applied on the bottom layer. The layers in between will have learning rate somewhere between 1e-5 and 1e-4 (geometrically progress).

See the definition of set_hyper for more details on how the learning rates distribute among layers: link

Bhavya1600 · April 4, 2021, 3:09pm

Hii Riven,
I am currently working on an image classification project for Melanoma classification. I am using FastAI. I have 3 different layer groups where the first two are of the densenet161 model and the last is a fully connected layer group.

In training, I first freeze the model and train it for a few epochs on a small learning rate (found by lr_find() function ). Then, I unfreeze the model and train it using the slice function. I am confused of which two learning rates to chose. Is there a way to approximate these rates using the lr_find() function?