Lesson 9 Discussion & Wiki (2019)

Does anyone know why you need to call x.grad_zero_() at the end of each iteration? Did I miss something here?

2 Likes

No way to vectorize the training loops?

1 Like

This looks a lot like the What is torch.nn really?

2 Likes

It’s done inside PyTorch when you use log softmax.

1 Like

If you don’t, you keep the gradients of the previous batch and the new gradients are added to the old ones. It’s how PyTorch works.

4 Likes

Without adding what back? The logsumexp trick gives the same result, it’s just more numerically stable.

2 Likes

I think that is the idea of the course progression.

Recreating and improving what’s there, as a learning process.

That’s strange to me – but that’s… cool… I guess. Is there some use case where you need to accumulate the gradients?

Yes, if you have a very large model and a very low batch size, you’d want to accumulate gradients over a few batches before doing the step.

6 Likes

RL maybe? Something like experience replay, although I can’t remember if gradients are stored in memory or actualy rewards/weights

This is done to be able to effectively do larger batches even with small batch size (constrained by GPU RAM)

2 Likes

This has a nice, longer explanation: https://discuss.pytorch.org/t/why-do-we-need-to-set-the-gradients-manually-to-zero-in-pytorch/4903/9

23 Likes

He used the logSumExp trick identity : a + log(sum(exp -a). I have always used this https://jamesmccaffrey.wordpress.com/2016/03/04/the-max-trick-when-computing-softmax/
and I get the same result as Pytorch.

2 Likes

This is fantastic, thank you (and for everyone else who replied)!

3 Likes

That’s because it’s also subtracted in the numerator in this link. In the notebook example it wasn’t and in general the LogSumExp trick requires to add it back.
In the case of Log Softmax, this is an additional refactoring that makes it even easier, yes.

1 Like

So the main feature of nn.Module is that it has a __setattr__ that allows you to update the model parameters in a more convenient way?

That and other things, but that’s the basic desired feature, yes.

1 Like

I don’t think we’ve gone over the model.parameters function… Just making sure we’re playing by the rules :stuck_out_tongue:

I believe the DummyModule that records the parameters for you was that bit :stuck_out_tongue:

how different is nn.Parameter from what is shown up