Lesson 11 discussion and wiki

Could we learn the parameters in the step such as in a meta learning scenario?

1 Like

Yes of course they use heavy machinery, but being able to scale and using a 64k batch size is quite an achievement.

Not really as they don’t appear in the loss, so we don’t have gradients for them.

Why not define “self.grad_params=” during an __init__() once, when the Optimizer is first constructed?

It’s exactly the same to define it as a computed property, I think.

I think jemery meant to say ._add is in place .add for pytorch

Really? Seems like he is rebuilding the list of grad_parameters every time he wants to refer to them.

The reason why you might generally want to do that is in case for whatever reason self.param_groups or self.hypers changes after calling __init__. Properties (or in this case, a zero-argument method) keep attributes that should be consistent with each other consistent with each other.

Yes, but it doesn’t take time to do so. For a super deep model, you may have 300 parameters (we count the different tensors).

1 Like

:exploding_head:

2 Likes

shouldn’t the momentum be 0.9prev + 0.1new_grad? The equation looks like 0.9*prev + new_grad.

3 Likes

And we made that mistake a lot of time, but basic momentum doesn’t have the 0.1.

1 Like

The journey to the deep knowledge seems to be pretty hard :smile:
One more reason to have a critical mind and ask questions.

4 Likes

pretty hard to explain people , who evaluate and approve models before moving to production :frowning:

OK, but then you don’t keep the gradient normalized - wouldn’t it keep growing?

emphasizes the importance of getting the foundations right

2 Likes

but wouldn’t that make the momentum grows out of control since we are essentially doubling it every time? 0.9 * old_mom + 1.0 * new_mom as opposed to 0.9 * old_mom + 0.1 * new_mom

Jeremy is answering this now.

1 Like

That’s what Jeremy is showing. But unless you have dampening, momentum doesn’t have the 0.1. Check the PyTorch source code :wink:

3 Likes

You’re not doubling it since your old contributions have 0.9**i after i iterations, and that get to 0 pretty quickly.

1 Like