I don’t get how the softmax is calculated in Chapter 5 (pg 199 in book). Here is what I see, maybe you can help.
The activations are in sm_act (col 0 is 3s, col 1 is 7s)
sm_act
tensor([[0.6025, 0.3975],
[0.5021, 0.4979],
[0.1332, 0.8668],
[0.9966, 0.0034],
[0.5959, 0.4041],
[0.3661, 0.6339]])
The targets are in targ:
targ = tensor([0,1,0,1,1,0])
The loss is:
idx = range(6)
sm_acts[idx, targ]
tensor([0.6025, 0.4979, 0.1332, 0.0034, 0.4041, 0.3661])
My question: Take the first row. If the target is ‘0’, and our predication is 0.6025 (from col 0), shouldn’t the loss be 1 - 0.6025 = 0.3975.
Thanks much!