when we are at the bottom of loss curve of val_set , we find biggest Lr that times by slope of the point (some where at the bottom)does not over_shoot us
but , after finding lr , when we are gonna use fit function
we can be over_shooted , because we have bigger slope , this time , because we are not at the bottom
and another question is
in using that Lr for SGD with restart
for each restart , how is it possible to be shooted in another non_spiky valley ?
in picking Lr in Lr_finder, we were supposed to choose the biggest lr which can not over_shoot us
as you said we pick lr some where near the bottom , where slope is decreased
so later when we are gonna use that for the fit , we are not necessarily near the the bottom
so we have bigger slope and timing that slope by our Lr can lead to over shooting