How to penalize missclassifcation differently?

Assuming you are building a classifier to predict if an image is either:

  • Chicken burger
  • Beef burger
  • Car
  • Human

It is reasonable to assume that both Chicken burger and Beef burger look very similar and distinguishing them is a harder task than distinguishing Chicken burger from Car.

Now assume we have two Chicken burger images: A and B.
Image A predictions came out as follows:
{“Chicken burger” : 0.4, “Beef burger” : 0.6, “Car” : 0, “Human” : 0}
while image B predictions came out as follows:
{“Chicken burger” : 0.4, “Beef burger” : 0, “Car” : 0.6, “Human” : 0}

Now, if cross entropy loss is used, then the two cases are penalized the same way; however, it is definitely worse to think that a chicken burger is a car than to think it is a beef burger.

I have a hunch that if we can accommodate for this then this might enhance the accuracy of such problems. So are there any loss functions that take care of that? or how to implement one that does?


While I am writing the post, I had the idea of modeling such problems as a regression problem and pick values that are close to each other for similar classes, as in
0 for chicken burger
1 for beef burger
5 for cat and
9 for human
However, this creates another problem which is the notion that a human is far away from burger than a car is. Which I think can be solved, to some extent, by modeling this in a 2D space to ensure that human is the same distance from car and is both are same distance from burgers, while burgers are close to each other. What do you think?

1 Like

Interesting Idea. Have you heard about word embeddings like the result of word2vec? This would fit right into your concept. But seems really complicated to get that working.

1 Like

Hmmm well, embeddings are definitely interesting here. Technically, the 2D representation of classes is an embedding; nevertheless, it is created manually. It would be interesting if one can use a model to create embeddings for classes. Maybe by using some dimensionality reduction with a pretrained model on the entire dataset and and then building a regression model that predicts the embedding (class) of an image.

That is exactly what I was thinking about, but to lazy to formulate. I dont know if anyone has tried that yet, but I’m really not on the bleading edge. The seems like an interesting thing to try out.