Does anyone know why you need to call x.grad_zero_()
at the end of each iteration? Did I miss something here?
No way to vectorize the training loops?
This looks a lot like the What is torch.nn really?
Itâs done inside PyTorch when you use log softmax.
If you donât, you keep the gradients of the previous batch and the new gradients are added to the old ones. Itâs how PyTorch works.
Without adding what back? The logsumexp trick gives the same result, itâs just more numerically stable.
I think that is the idea of the course progression.
Recreating and improving whatâs there, as a learning process.
Thatâs strange to me â but thatâs⌠cool⌠I guess. Is there some use case where you need to accumulate the gradients?
Yes, if you have a very large model and a very low batch size, youâd want to accumulate gradients over a few batches before doing the step.
RL maybe? Something like experience replay, although I canât remember if gradients are stored in memory or actualy rewards/weights
This is done to be able to effectively do larger batches even with small batch size (constrained by GPU RAM)
This has a nice, longer explanation: https://discuss.pytorch.org/t/why-do-we-need-to-set-the-gradients-manually-to-zero-in-pytorch/4903/9
He used the logSumExp trick identity : a + log(sum(exp -a)
. I have always used this https://jamesmccaffrey.wordpress.com/2016/03/04/the-max-trick-when-computing-softmax/
and I get the same result as Pytorch.
This is fantastic, thank you (and for everyone else who replied)!
Thatâs because itâs also subtracted in the numerator in this link. In the notebook example it wasnât and in general the LogSumExp trick requires to add it back.
In the case of Log Softmax, this is an additional refactoring that makes it even easier, yes.
So the main feature of nn.Module
is that it has a __setattr__
that allows you to update the model parameters in a more convenient way?
That and other things, but thatâs the basic desired feature, yes.
I donât think weâve gone over the model.parameters
function⌠Just making sure weâre playing by the rules
I believe the DummyModule that records the parameters for you was that bit
how different is nn.Parameter from what is shown up