When running n epochs with fit_one_cycle on can see that valid_loss (or even train_loss) is not monotonically decreasing all the time. This may be due to high learning rate, small batch size, etc.

I’d like to know if the model learned is the one obtained at the end of the last epoch (which is not necessarily the optimal seen throughout the epochs), or if it is the optimal model “produced so far”. More concretely, if we run 10 epochs, and at epoch 9 the train_loss is 0.8, and at epoch 10 the train_loss is 0.81, which model will be outputted, the one as in epoch 9 or the one as in epoch 10?

Thank you!