I believe the idea of fit_one_cycle is to start with small learning rates, but high momentum, and then increase lr and drop mom for half the training, and reverse the trends for the second half. Doing a second cycle will initially have high momentum with a bit less history, so the start of training is not strongly tethered and can bounce out of a local minimum before stabilizing.