Assuming you are building a classifier to predict if an image is either:

- Chicken burger
- Beef burger
- Car
- Human

It is reasonable to assume that both **Chicken burger** and **Beef burger** look very similar and distinguishing them is a harder task than distinguishing **Chicken burger** from **Car**.

Now assume we have two Chicken burger images: A and B.

Image A predictions came out as follows:

{“Chicken burger” : 0.4, **“Beef burger” : 0.6**, “Car” : 0, “Human” : 0}

while image B predictions came out as follows:

{“Chicken burger” : 0.4, “Beef burger” : 0, **“Car” : 0.6,** “Human” : 0}

Now, if cross entropy loss is used, then the two cases are penalized the same way; however, it is definitely worse to think that a chicken burger is a car than to think it is a beef burger.

I have a hunch that if we can accommodate for this then this might enhance the accuracy of such problems. So are there any loss functions that take care of that? or how to implement one that does?

While I am writing the post, I had the idea of modeling such problems as a regression problem and pick values that are close to each other for similar classes, as in

0 for chicken burger

1 for beef burger

5 for cat and

9 for human

However, this creates another problem which is the notion that a human is far away from burger than a car is. Which I think can be solved, to some extent, by modeling this in a 2D space to ensure that human is the same distance from car and is both are same distance from burgers, while burgers are close to each other. What do you think?