[Need Help] Loss gradients by sample

To be clear, by sample you mean a single item of your batch? The unreduced loss should be a 1D tensor of length batch_size. By default fastai/pytorch will compute the mean (or optionally sum) of that and use that as the loss.