def nll_loss_seq(inp, targ):
sl,bs,nh = inp.size()
targ = targ.transpose(0,1).contiguous().view(-1)
return F.nll_loss(inp.view(-1,nh), targ)
I’m a bit puzzled, but pedagogically speaking, shouldn’t this be sl,bs,vocab = inp.size() instead?