I am trying to build a language model on social reviews. In the process tried to fine-tune by unfreezing that last two layers of AWD-LSTM model got the following out put shown in Iter-1.
- learn_lm = language_model_learner(data_lm,AWD_LSTM, drop_mult=0.5)
- learn_lm.freeze_to(-2)
- learn_lm.fit_one_cycle(3, max_lr=slice(2.29E-02/(2.6**4),2.29E-02), moms=(0.8,0.7))
iter-2
i tried re-run lines the 2 & 3 anticipating the similar/close loss values and accuracy but the losses and accuracy is better than previous run in the first cycle.
iter-3
Continued it for more run, results got even better
Is this the expected behaviour of fit_one_cycle or we should except the similar outputs of iter-1?