Gradient of params - Lesson 4

AishaLB · March 21, 2022, 4:53pm

Hi Community,

I am just studying lesson 4 and arrived to the part where the gradient descent is calculated.
I am very confused about the part where the gradient of the params is taken instead of the gradient of the loss function. Especially since I don’t really get what a gradient of parameters is.

I understand that the loss function is a chain of functions loss(params) = mse(f(t,params)) and that the gradient of that is calculated by chain rule but why are we not doing params = params - lr*loss.gradient ?

miwojc · March 22, 2022, 8:44pm

does this sidebar info helps:?

a: This API might throw you off if you’re coming from math or physics. In those contexts the “gradient” of a function is just another function (i.e., its derivative), so you might expect gradient-related APIs to give you a new function. But in deep learning, “gradients” usually means the value of a function’s derivative at a particular argument value. The PyTorch API also puts the focus on the argument, not the function you’re actually computing the gradients of. It may feel backwards at first, but it’s just a different perspective.

AishaLB · March 23, 2022, 8:22am

Thank you! it did throw me off for that reason so basically this means, the gradient of the parameters the loss’(paramters) ?