I don t understand the results for the losses. I would have taken loss=1-(softmax value of the correct class) instead of loss=the solftmax value of the correct class.

As it is written, if we have a perfect prediction, we also will have a higher loss value. Shouldn t it be the opposite ?!

I guess I missed something but can t figure out what. Can someone please clarify ?

it is the opposite. you donâ€™t have a â€śperfect predictionâ€ť anywhere, you have predicted the likelihood of your input being 3 or 7, some of which are correct.

In row 4 (idx=3) where it was really sure and correct, the probability was .99664 so the loss was very low.

In row 2 (idx=1) it was correct but not very sure, prob was .502065 which is pretty much a 50/50 guess, so the loss was much higher.

We have indeed a probability of being a 3 of 0.99664, but targ is 1. I assumed targ equals to 1 corresponds to 7s and when equals to 0 corresponds to 3s. Isn t it the case? I can t find any mention of that in textâ€¦

Yes your understanding is correct, the model wrongly predicted the entry for row 4 as target value 3, and as shown in page 200 the softmax+ NLL is kept as -0.00336017. Since in Pytorch NLL just adds the negative sign to the input.

However, if you apply log_softmax+NLL (which is the actual formula of Cross Entropy used in Pytorch) the value corresponding to this entry gets to be 5.6958(ie., -log(0.00336)), which is show in page 202. There by the wrong prediction penalized by higher loss.