I am working on learning Neural Networks, and I am a bit unclear on the benefits of cross-entropy loss function for multi-class image classification. I am hoping someone can help point me in the right direction. I am going to outline my thought process and what I think I know.
A simple way to measure loss it to take the difference between the prediction (passed through a sigmoid so all are between 0 and 1) and my y_truth. In a single class classifier (taken from new Fastai book by Sylvain and Jeremy on github.com/fastai/fastbook) it looks like this:
def mnist_loss(predictions, targets):
predictions = predictions.sigmoid()
return torch.where(targets==1, 1-predictions, predictions).mean()
Why should I use cross-entropy rather than extending this? If I take prediction that are all between 0 and 1, why not just take the difference.
For example if we have 2 images we are classifying into 3 classes, we may have this:
pred = tensor([.2,.8,.4],[.3,.1,.4])
target = tensor([0,1,0],[0,0,1])
The more confident it is about a wrong class, the more it adds to loss. The less confident it is about a correct class, the more that adds to the loss as well. We could have 1 image have multiple classes by having the target have multiple columns be ā1ā in the same row. That seems to do what we want and the more we minimize it the more closely the model is matching the targets.
My understanding is cross-entropy does something similar, but rather than only putting each prediction between 0 and 1, it makes each prediction of this class a % likelihood (so all sum to 1). To me, this seems like itās doing roughly the same thing just with an additional conversion. I donāt really understand why this would make it easier or faster for the model to train.
Can anyone point me in the right direction for where I need to either expand or correct my understanding of this? My goal is to try to build an understanding of loss functions so that I can understand when and how they should be changed for specific problems.