Costs function is a synonym for loss function. For “learning” / “optimizing” the model, you want to minimize the function.

In mathematical optimization, statistics, decision theory, machine learning and computational neuroscience, a loss function or cost function is a function that maps an event […]

A cost function and a loss function are indeed the same thing. “Error”, in the sense of SSE and MSE, is the difference between the predicted value and the actual value. SSE is calculated by squaring each error, and then summing them. MSE is the sum of squared errors divided by the number of data points. Both of these are valid cost/loss functions.

Thanks for this explanation @munyari, it was very clear.

As I work through the course and try to understand the topics, one thing that confuses me a little is why we need the derivative of the cost function.

For example, if I increase a weight by a “little bit”, and the cost function output goes down, then can’t I confidently increase the weight by a “little bit” based on that cost function alone? I am trying to understand what the derivative tells me with respect to the output of the cost function.

I know it has been explained in a lesson somewhere, and I will be reviewing it again.