On a side note, it’s really cool to visualize how hyperparams (learning rate and momentum) evolves batch after batch. See following picture