Confused on softmax table in Chapter 5

I don’t get how the softmax is calculated in Chapter 5 (pg 199 in book). Here is what I see, maybe you can help.

The activations are in sm_act (col 0 is 3s, col 1 is 7s)

sm_act
tensor([[0.6025, 0.3975],
[0.5021, 0.4979],
[0.1332, 0.8668],
[0.9966, 0.0034],
[0.5959, 0.4041],
[0.3661, 0.6339]])

The targets are in targ:

targ = tensor([0,1,0,1,1,0])

The loss is:

idx = range(6)
sm_acts[idx, targ]
tensor([0.6025, 0.4979, 0.1332, 0.0034, 0.4041, 0.3661])

My question: Take the first row. If the target is ‘0’, and our predication is 0.6025 (from col 0), shouldn’t the loss be 1 - 0.6025 = 0.3975.

Thanks much!

Good questions, the answers are on the following pages of the book (200-201). These activations are not the loss, but an interim step to calculating the loss. What happens next, is you need to take the log of those activations, and make it negative - that will result in NLL loss. If the softmax activation is close to 1, then log will be close to zero (log of 1 = 0), and your loss will be very low.

3 Likes