DeepLearning-LecNotes6

def nll_loss_seq(inp, targ):
    sl,bs,nh = inp.size()
    targ = targ.transpose(0,1).contiguous().view(-1)
    return F.nll_loss(inp.view(-1,nh), targ)

I’m a bit puzzled, but pedagogically speaking, shouldn’t this be sl,bs,vocab = inp.size() instead?

I did check len(md.val_ds) and what you say seems correct . Check out the image below .


It would be great if somebody could confirm this .