Lesson 11 discussion and wiki

Dee · April 11, 2019, 2:57am

Could we learn the parameters in the step such as in a meta learning scenario?

sgugger · April 11, 2019, 2:58am

Yes of course they use heavy machinery, but being able to scale and using a 64k batch size is quite an achievement.

sgugger · April 11, 2019, 2:59am

Not really as they don’t appear in the loss, so we don’t have gradients for them.

drscotthawley · April 11, 2019, 2:59am

Why not define “self.grad_params=” during an __init__() once, when the Optimizer is first constructed?

sgugger · April 11, 2019, 3:00am

It’s exactly the same to define it as a computed property, I think.

firetix · April 11, 2019, 3:01am

I think jemery meant to say ._add is in place .add for pytorch

drscotthawley · April 11, 2019, 3:02am

Really? Seems like he is rebuilding the list of grad_parameters every time he wants to refer to them.

mediocrates · April 11, 2019, 3:03am

The reason why you might generally want to do that is in case for whatever reason self.param_groups or self.hypers changes after calling __init__. Properties (or in this case, a zero-argument method) keep attributes that should be consistent with each other consistent with each other.

sgugger · April 11, 2019, 3:04am

Yes, but it doesn’t take time to do so. For a super deep model, you may have 300 parameters (we count the different tensors).

neuradai · April 11, 2019, 3:08am

bholmer · April 11, 2019, 3:08am

shouldn’t the momentum be 0.9prev + 0.1new_grad? The equation looks like 0.9*prev + new_grad.

sgugger · April 11, 2019, 3:10am

And we made that mistake a lot of time, but basic momentum doesn’t have the 0.1.

devforfu · April 11, 2019, 3:12am

The journey to the deep knowledge seems to be pretty hard
One more reason to have a critical mind and ask questions.

harikrishnanrajeev · April 11, 2019, 3:13am

pretty hard to explain people , who evaluate and approve models before moving to production

bholmer · April 11, 2019, 3:13am

OK, but then you don’t keep the gradient normalized - wouldn’t it keep growing?

Kaspar · April 11, 2019, 3:14am

emphasizes the importance of getting the foundations right

cstorm125 · April 11, 2019, 3:14am

but wouldn’t that make the momentum grows out of control since we are essentially doubling it every time? 0.9 * old_mom + 1.0 * new_mom as opposed to 0.9 * old_mom + 0.1 * new_mom

rachel · April 11, 2019, 3:15am

Jeremy is answering this now.

sgugger · April 11, 2019, 3:15am

That’s what Jeremy is showing. But unless you have dampening, momentum doesn’t have the 0.1. Check the PyTorch source code

sgugger · April 11, 2019, 3:16am

You’re not doubling it since your old contributions have 0.9**i after i iterations, and that get to 0 pretty quickly.