Lesson 9 Discussion & Wiki (2019)

andrea · March 26, 2019, 1:52am

is there an advantage of paying close attention to initialisation vs using something like batch norm?

username_not_found · March 26, 2019, 1:53am

Jeremy said that if you forgot something to go back to the part1 notebooks, but where are they? Is this https://github.com/fastai/fastai_docs/tree/master/dev_nb correct?

sgugger · March 26, 2019, 1:53am

Oh yes, it will change the way your model train by a lot.

rachel · March 26, 2019, 1:54am

The notebooks from part 1 are here: https://github.com/fastai/course-v3/tree/master/nbs/dl1

andrea · March 26, 2019, 1:54am

in what way? faster convergence over batch norm?

nok · March 26, 2019, 1:54am

Why do we want to keep variance =1 all the time? Is it because of the gradient explosion/vanishing problem?

jcatanza · March 26, 2019, 1:54am

It’s only a definition

username_not_found · March 26, 2019, 1:54am

Thanks!

What is the link I found then…

alando · March 26, 2019, 1:55am

how does your computer explode when taking std deviation and mean again and again?

sgugger · March 26, 2019, 1:55am

Jeremy is explaining it now.

devforfu · March 26, 2019, 1:55am

I believe it could even be like convergence vs. not convergence at all. (If you have networks with many layers, like it is discussed in the papers shared during previous lesson).

jcatanza · March 26, 2019, 1:55am

Yup, that is so

sgugger · March 26, 2019, 1:55am

No it’s multiplying a vector by the same matrix again and again.

alando · March 26, 2019, 1:56am

this feels important to the discussion but I think I missed what jeremy was saying here.

ewjordan · March 26, 2019, 1:58am

Is there any reason not to initialize each layer and directly scale the weights/biases to achieve 0 mean/1 stdev based on the observed statistics, rather than pre-solving for particular activations/structures?

Gabriel_Syme · March 26, 2019, 1:59am

This isn’t really a ‘little difference’. Activations/Initializations are crucial parts of the network and probably a detail the biggest % of realtively new users never adjusts default values.

Good job!

andavargas · March 26, 2019, 1:59am

is it sqrt(5) or sqrt(3) ?

sgugger · March 26, 2019, 1:59am

That’s what BatchNorm does.

init_27 · March 26, 2019, 2:00am

Link to the Twitter thread that Jeremy had mentioned

ewjordan · March 26, 2019, 2:00am

BatchNorm isn’t about init, though, it’s applied during net operation, right?