Hi all,
When doing learn.fit_one_cycle
, it uses a param scheduler to adjust the learning rate as it goes.
At some point it reaches a plateau, so I tried using ReduceLROnPlateau
to reduce the learning rate. But it only changes the ‘lr’ hyper parameter, not the LR policy schedulers, and hence has no effect.
Any ideas how to address this and enable reducing the LR inside the scheduler in a generic way? Anything already planned in this area?
Few questions regarding this:
- Does it matter to wait until the end of a cycle to determine there is a plateau?
- Is it worth just reducing the main ‘lr_max’ parameter, or are there others worth changing?
- To pass the change to scheduler, should it be recreated, or wrap the
sched
parameters, or something else? -
lr_find()
simulatesself.fit()
with different values. If it’s to be used to find an optimal LR forfit_one_cycle
/fit_sgdr
/others, should it use those instead? - What is actually the best value from
LRFinder
to pass as input tolr
/lr_max
initially?
Thanks!