I am trying to build a language model on social reviews. In the process tried to fine-tune by unfreezing that last two layers of AWD-LSTM model got the following out put shown in Iter-1.
- learn_lm = language_model_learner(data_lm,AWD_LSTM, drop_mult=0.5)
- learn_lm.fit_one_cycle(3, max_lr=slice(2.29E-02/(2.6**4),2.29E-02), moms=(0.8,0.7))
i tried re-run lines the 2 & 3 anticipating the similar/close loss values and accuracy but the losses and accuracy is better than previous run in the first cycle.
Continued it for more run, results got even better
Is this the expected behaviour of fit_one_cycle or we should except the similar outputs of iter-1?