Is this why I saw opt.lr overshooting my target lr the other day?
Edit: actually I bet it’s becaue of the opposite… since momentum goes down the dampening does too so the amount added increases compared to the amount coming in from previous iterations