Lesson 9 Discussion & Wiki (2019)

mediocrates · March 26, 2019, 2:20am

Does anyone know why you need to call x.grad_zero_() at the end of each iteration? Did I miss something here?

gmohandass · March 26, 2019, 2:21am

No way to vectorize the training loops?

wgpubs · March 26, 2019, 2:21am

This looks a lot like the What is torch.nn really?

sgugger · March 26, 2019, 2:21am

It’s done inside PyTorch when you use log softmax.

PierreO · March 26, 2019, 2:21am

If you don’t, you keep the gradients of the previous batch and the new gradients are added to the old ones. It’s how PyTorch works.

sgugger · March 26, 2019, 2:21am

Without adding what back? The logsumexp trick gives the same result, it’s just more numerically stable.

Gabriel_Syme · March 26, 2019, 2:21am

I think that is the idea of the course progression.

Recreating and improving what’s there, as a learning process.

mediocrates · March 26, 2019, 2:23am

That’s strange to me – but that’s… cool… I guess. Is there some use case where you need to accumulate the gradients?

sgugger · March 26, 2019, 2:23am

Yes, if you have a very large model and a very low batch size, you’d want to accumulate gradients over a few batches before doing the step.

Gabriel_Syme · March 26, 2019, 2:23am

RL maybe? Something like experience replay, although I can’t remember if gradients are stored in memory or actualy rewards/weights

PierreO · March 26, 2019, 2:24am

This is done to be able to effectively do larger batches even with small batch size (constrained by GPU RAM)

rachel · March 26, 2019, 2:24am

This has a nice, longer explanation: https://discuss.pytorch.org/t/why-do-we-need-to-set-the-gradients-manually-to-zero-in-pytorch/4903/9

sparalic · March 26, 2019, 2:24am

He used the logSumExp trick identity : a + log(sum(exp -a). I have always used this https://jamesmccaffrey.wordpress.com/2016/03/04/the-max-trick-when-computing-softmax/
and I get the same result as Pytorch.

mediocrates · March 26, 2019, 2:25am

This is fantastic, thank you (and for everyone else who replied)!

sgugger · March 26, 2019, 2:26am

That’s because it’s also subtracted in the numerator in this link. In the notebook example it wasn’t and in general the LogSumExp trick requires to add it back.
In the case of Log Softmax, this is an additional refactoring that makes it even easier, yes.

gietema · March 26, 2019, 2:26am

So the main feature of nn.Module is that it has a __setattr__ that allows you to update the model parameters in a more convenient way?

sgugger · March 26, 2019, 2:27am

That and other things, but that’s the basic desired feature, yes.

mkolodny · March 26, 2019, 2:27am

I don’t think we’ve gone over the model.parameters function… Just making sure we’re playing by the rules

sgugger · March 26, 2019, 2:28am

I believe the DummyModule that records the parameters for you was that bit

champs.jaideep · March 26, 2019, 2:29am

how different is nn.Parameter from what is shown up