Hi Ezno. I am not a real mathematician, so experts feel free make corrections. As I understand it, many different functions can serve for a loss function. It only needs to penalize the wrong guesses, reward the correct ones, and be mostly differentiable. So the function you posted above would work fine.
See this article for examples of standard loss functions:
Each loss function has its own justifications and characteristics, for example, how sharply they separate classes and how much they penalize outliers. I’ve use hinge loss and found it trained to higher accuracy for one problem.
Cross entropy loss is the loss function most often used in machine learning. Its strength is that it measures the divergence between an unknown probability distribution and a predicted distribution. Here I am way out of my depth mathematically. But if you want to jump down the math rabbit hole…
and here’s a more informal explanation:
https://www.quora.com/When-should-you-use-cross-entropy-loss-and-why
To sum up, cross entropy has informational theoretic arguments for it being a generally good loss function for classification tasks. But there are many other choices for classification loss functions, each with its own strengths that can be relevant to your particular machine learning problem.