Lesson 4 - Official Topic

The median is not going to be differentiable, that’s why we take the mean. Also, you want the points that are really wrongly predicted to give big gradients, so that your model gets better. On the opposite, samples that are rightly predicted won’t contribute a lot to the gradients, which is also what we want.

The idea is that even if you have one wrongfully predicted sample, it’s good that it drags your loss up, and therefore gives a chance to your model to get more accurate.