Cross entropy Demystified

Wrote a medium article about the most important questions I had about cross entropy, and laid out a simple and intuitive understanding of the concept. Please read it and let me know what you think. Let me know if i missed anything, or if you had some of the same questions I had.
Link :

1 Like

Regarding the loss

The number has to be positive.

Probably, you’re not correct. The value of loss, in general, is positive but it doesn’t have to. You use the loss to calculate gradients so we expect a function that just has minimums. But definitely it has much more sense of being “error” if it’s not lower than 0.

1 Like

I dont fully follow, unless you mean losses can be negative in the sense the function can be upside down and we can maximize the gradients to find maxima, then i understand. But in the traditional sense, where we are trying to minimize loss, can the loss still be negative?
The loss is In essence the output label percentage probability (almost always one for classification) minus predicted label percentage probability ( 0 <=x<=1). How do we get negative losses? Also what would a negative loss imply? that our prediction is more correct thann the label?

1 Like