Why author used a quadratic equation for SGD?

arnuld · November 8, 2020, 11:14am

In Stochastic Gradient Descent (SGD) section there is example where we want to measure the speed of a roller coaster every second for 20 seconds. Authors use this mathematical equation as the basis:

y = at^2 + bt +c where t = time

My question is why not use a linear equation?

y = mt + b

nn.Charles · November 8, 2020, 11:52am

Hi @arnuld,

You would not be able to fit a “U-shaped” function with a linear equation.

Hope it helps,
Charles

arnuld · November 8, 2020, 11:59am

Hmm… you mean a parabolic curve relates to variable speed: higher, high, lower, low and zero speed at different times. Speed variation of such type can’t be represented by a straight line. Am I correct?

nn.Charles · November 8, 2020, 2:44pm

We are trying to approximate this curve.
Screenshot 2020-11-08 at 15.42.19

This can only be done by an equation of order 2 or more. A linear equation only can represent a straight line.

arnuld · November 10, 2020, 11:48am

Why we are trying to approximate this curve? Why not approximate a Z or M shape curve? How do we know that a roller-coaster graphs like parabola?

Fastest-Roller-Coaster-in-the-World

DeepBlender · November 10, 2020, 1:19pm

It is just a learning example. The point of the exercise is to show that SGD can be used for approximations of curves.
More complicated shapes can be approximated too, and that is literally what deep learning is about (though every input can only produce one output, so Z shapes can not be approximated and your rollercoaster image wouldn’t work either). However, approximating more complicated shapes requires more complicated functions. How this can be done is described in the follow up lectures.