Learn.lr_find() gives different value

kushaj · July 8, 2019, 4:52pm

Short answer. The data that you are passing is different for different runs due to various data augmentations used.

Below I show the process that I used to find this thing. (Most things were not necessary but the below process is a good way of debugging things).

So there are two functions are mostly responsible for this learn.lr_find() or learn.recoder.plot(). The first step is to make sure that the state_dict of the model remains same. To do so I used the below code.

learn.model.state_dict()

learn.lr_find()
learn.model.state_dict()

learn.recorder.plot()
learn.model.state_dict()

The state_dict was same in all the cases. Now the next step is to check the internal state that we pass to learn.recorder.plot(). The idea is if the internal states (more specifically the losses) are same, then the randomness is due to learn.recorder.plot() else it is due to learn.lr_find().

The reason I use losses is learn.recorder.plot() only manipulates the losses and there is no way to introduce randomness. I use the below code to check for the loss values.

learn.lr_find()
learn.recorder.losses

learn.lr_find()
learn.recorder.losses

As it turns out the loss values were different in the above two cases. So now we know the randomness is introduced in the learn.lr_find(). Next step is to check the source code of it. The source code is

start_lr = learn.lr_range(start_lr)
start_lr = np.array(start_lr) if is_listy(start_lr) else start_lr
end_lr = learn.lr_range(end_lr)
end_lr = np.array(end_lr) if is_listy(end_lr) else end_lr
cb = LRFinder(learn, start_lr, end_lr, num_it, stop_div)
epochs = int(np.ceil(num_it/len(learn.data.train_dl)))
learn.fit(epochs, start_lr, callbacks=[cb], wd=wd)

start_lr and end_lr are same for all the cases. So the problem must be in LRFinder. Now if we see the source code of LRFinder, we see the following lines.

class LRFinder(LearnerCallback):
    "Causes `learn` to go on a mock training from `start_lr` to `end_lr` for `num_it` iterations."
    def __init__(self, learn:Learner, start_lr:float=1e-7, end_lr:float=10, num_it:int=100, stop_div:bool=True):
        super().__init__(learn)
        self.data,self.stop_div = learn.data,stop_div
        self.sched = Scheduler((start_lr, end_lr), num_it, annealing_exp)

Here we found the problem. self.data is not same for different iterations (due to various data augmentations used). Hence different loss values and thus different graphs.