Short answer. The data that you are passing is different for different runs due to various data augmentations used.
Below I show the process that I used to find this thing. (Most things were not necessary but the below process is a good way of debugging things).
So there are two functions are mostly responsible for this learn.lr_find()
or learn.recoder.plot()
. The first step is to make sure that the state_dict
of the model remains same. To do so I used the below code.
learn.model.state_dict()
learn.lr_find()
learn.model.state_dict()
learn.recorder.plot()
learn.model.state_dict()
The state_dict was same in all the cases. Now the next step is to check the internal state that we pass to learn.recorder.plot()
. The idea is if the internal states (more specifically the losses) are same, then the randomness is due to learn.recorder.plot()
else it is due to learn.lr_find()
.
The reason I use losses is learn.recorder.plot()
only manipulates the losses and there is no way to introduce randomness. I use the below code to check for the loss values.
learn.lr_find()
learn.recorder.losses
learn.lr_find()
learn.recorder.losses
As it turns out the loss values were different in the above two cases. So now we know the randomness is introduced in the learn.lr_find()
. Next step is to check the source code of it. The source code is
start_lr = learn.lr_range(start_lr)
start_lr = np.array(start_lr) if is_listy(start_lr) else start_lr
end_lr = learn.lr_range(end_lr)
end_lr = np.array(end_lr) if is_listy(end_lr) else end_lr
cb = LRFinder(learn, start_lr, end_lr, num_it, stop_div)
epochs = int(np.ceil(num_it/len(learn.data.train_dl)))
learn.fit(epochs, start_lr, callbacks=[cb], wd=wd)
start_lr and end_lr are same for all the cases. So the problem must be in LRFinder. Now if we see the source code of LRFinder, we see the following lines.
class LRFinder(LearnerCallback):
"Causes `learn` to go on a mock training from `start_lr` to `end_lr` for `num_it` iterations."
def __init__(self, learn:Learner, start_lr:float=1e-7, end_lr:float=10, num_it:int=100, stop_div:bool=True):
super().__init__(learn)
self.data,self.stop_div = learn.data,stop_div
self.sched = Scheduler((start_lr, end_lr), num_it, annealing_exp)
Here we found the problem. self.data
is not same for different iterations (due to various data augmentations used). Hence different loss values and thus different graphs.