I have a question concerning the negative log likelyhood loss function defined in lesson 6:
def nll_loss_seq(inp, targ): sl,bs,nh = inp.size() targ = targ.transpose(0,1).contiguous().view(-1) return F.nll_loss(inp.view(-1,nh), targ)
The output of the model has the size
torch.Size([8, 512, 85]) (8 timesteps, bs = 512 and 85 being the embedding size).
So is it truly
nh in the loss function or shouldn’t it actually be
n_fac=85. Of course it does not change anything as it is only a variable… but for understanding purposes.
What do you think?