I got it to work by changing the out_sz to 10 (which is my number of classes).
I confused this variable with my wanted final output (1 label) but that should not be the output of the last activation function (log_softmax).
If I understand correctly the reason is that the nll_loss function must get a (bs X num_of_classes) shape input, with the log probability as the value for each column.
The target (y) should still be (bs X 1) as it expects to get there the index of the correct class.
Hope that helps someone