How to properly use 1cycle policy

alx · July 15, 2019, 3:31pm

Hi I’m a bit confused on how to interpret the following graph:

index

Here’s what I did:

max_lr = 0.0009
wd = 1e-4

# 1cycle policy - research paper
learn.fit_one_cycle(cyc_len=6, max_lr=max_lr, wd=wd)


max_lr =  0.00009
max_lrs = [max_lr/9,max_lr/3,max_lr]
wds = [wd/9,wd/3,wd]
wd = 1e-4

learn.unfreeze()
learn.fit_one_cycle(cyc_len=6, max_lr=max_lrs, wd=wds)


max_lr = 0.000009
max_lrs = [max_lr/9,max_lr/3,max_lr]
wds = [wd/9,wd/3,wd]
wd = 1e-4

learn.unfreeze()
learn.fit_one_cycle(cyc_len=6, max_lr=max_lrs, wd=wds)

Is this the ideal way to do it?

Oh and the lr with min loss is 0.009, you just can’t see it here.

AjayStark · July 16, 2019, 5:16am

Hi, running the code “fit_one_cycle(3)” and running “fit_one_cycle(1)” 3 times, are they different things?
If yes, which one is better to get more accuracy?

rohit_gr · July 16, 2019, 6:06am

Yes, they are different.

AjayStark · July 16, 2019, 6:34am

Oh, which is best?

rohit_gr · July 16, 2019, 6:38am

I think fit_one_cycle(3) is better because never saw anyone doing fit_one_cycle(1) 3 times.

NathanHub · July 16, 2019, 9:15am

Take a look here

AjayStark · July 16, 2019, 9:18am

Understood, thanks.
But since i’m not sure if i am over-fitting, i was running fit_one_cycle again and again having track of the accuracy.
Is there any other way to do this?

Antoine.C · July 16, 2019, 9:23am

I am looking at the documentation (on GitHub repository).

Didn’t find it, but the method Learner.fit_one_cycle() presumably is essentially the same as calling the function fit_one_cycle().

The signature of the function fit_one_cycle()reads

def fit_one_cycle(learn:Learner, cyc_len:int, max_lr:Union[Floats,slice]=defaults.lr,
                  moms:Tuple[float,float]=(0.95,0.85), div_factor:float=25., pct_start:float=0.3, final_div:float=None,
                  wd:float=None, callbacks:Optional[CallbackList]=None, tot_epochs:int=None, start_epoch:i

This function fit_one_cycle() ends with a call to the method Learner.fit():

learn.fit(cyc_len, max_lr, wd=wd, callbacks=callbacks)

noting that the argument cyc_len is passed as the parameter epochs, according to the signature of Learner.fit()

def fit(self, epochs:int, lr:Union[Floats,slice]=defaults.lr,
            wd:Floats=None, callbacks:Collection[Callback]=None)->None:

In other words, the cycle length appears to be the same as the number of epochs.

The method Learner.fit() ends with a call to the function fit():

fit(epochs, self, metrics=self.metrics, callbacks=self.callbacks+callbacks)

whose signature is:

def fit(epochs:int, learn:BasicLearner, callbacks:Optional[CallbackList]=None, metrics:OptMetrics=None)->None:

At this point, I would conclude that:

Whether learner.fit_one_cycle(3) is different from calling learner.fit_one_cycle(1) 3 times boils down to whether learner.fit(3) is different from calling learner.fit(1) 3 times.

Now, in the declaration of the function fit() there is a loop over epochs. I am not versed enough in callback functions to know what exactly they do, but on first inspection I didn’t see that this loop inside fit() would do anything different from a loop “outside” fit().

Antoine.C · July 16, 2019, 9:28am

OK, so I missed Nathan’s answer. Looking at the source code, I guess everything lies in what callback functions do…

NathanHub · July 16, 2019, 9:34am

Maybe this can help you. If you want to plot your accuracy (or whatever metric you want to track), use this. Hope that helps

AjayStark · July 16, 2019, 9:37am

Thanks, I’ll try it

alx · July 22, 2019, 10:18pm

Hi Nathan,

Thanks for your help. My question was more oriented towards how to interpret the graph what should be my max_lr, wd, and wds.