Hello. In the 4th notebook about MNIST basics, there is a line of code that defines a loss function before going through the end-to-end stochastic gradient descent example. That line is:

The name of the function is chosen as mse, but the formula is clearly indicating that the loss function is calculating the square root of MSE! Wouldn’t it be more correct if either the name of the function were RMSE (corresponding to root mean squared error), or the .sqrt() part were removed?

I’d agree with you that the loss here is actually rmse, not mse. However, it also really doesn’t matter very much. (The MSE “punishes” wrong values more than the RMSE.)

Also, in Step 3, the initial loss value is 25823.8086, wich is clearly not something that you get after calcluating the square root of the sum of the squared errors.

loss = mse(preds, speed)
loss
Out: tensor(25823.8086, grad_fn=<MeanBackward0>)

Instead, I assume that this is the value that you get after using MSE method, but not RMSE… This is so convoluted and not beginner-friendly…

I don’t know exactly what notebook this is, but note that the grad_fn says MeanBackward0. That makes me believe the mean() operation was the last thing executed. Otherwise it would have said something like SqrtBackward. (I may be wrong about this, but it does look like the sqrt was never used.)