This sounds like it makes sense to me. I don’t know Pytorch well either. Shouldn’t the softmax be over all possible words somehow?
This sounds like it makes sense to me. I don’t know Pytorch well either. Shouldn’t the softmax be over all possible words somehow?