Do Overshooting really happen in Gradient Descent or it just circles around minima?

We all might have seen video of Andrew NG from Machine Learning Course ( Gradient Descent in Practice II Learning Rate by Andrew Ng):

Figure : image