Hi experts and math heads,

Here’s a question that has been bugging my brain for a while, one that I’d ask the instructor if I were in a class.

Let’s take lesson3-camvid-tiramisu to be concrete. I used PyCharm to trace exactly what happens during training. The FlattenedLoss is computed for each training example (by minibatch actually). The input from the model is activations in the shape of 12x180x240, an activation for each of 12 classes, for each pixel. The target is 180x240, the number of the correct class for each pixel.

Next, the activations are exponentiated, and normalized to sum to 1 (softmax across the 12 dimension) to form a distribution of probabilities for each class. Then log of these probabilities is returned.

Next, nllLoss takes the log probabilities and the targets, calculates cross entropy loss, and returns some kind of sum or average loss for the image example.

Here’s where I get hung up. You could have two pixels right next to each other, one with much smaller activations. Yet after exponentiating, normalizing, and log, you get exactly the same log probabilities. And therefore exactly the same cross entropy loss and class probability map.

So I am looking for an intuitive sense of: what does it **mean** that one pixel’s activation is lower, even though they map into identical probabilities and loss? I understand the math, and can read and obey the docs. Rather, I want a way to make sense of this apparent inconsistency. The same question came up way back in Lesson 1 of 2018. Maybe those images of logos that get strongly classified as ‘dog’ actually have very low, skewed activations. Might they be somehow classified as “none of the above”?

Well, I feel vulnerable throwing my confusion and geeky question out to the court of forum opinion. But if anyone can shine some intuitively plausible light, I’d be relieved. Thanks!