Lesson 9 Discussion & Wiki (2019)

is there an advantage of paying close attention to initialisation vs using something like batch norm?

2 Likes

Jeremy said that if you forgot something to go back to the part1 notebooks, but where are they? Is this https://github.com/fastai/fastai_docs/tree/master/dev_nb correct?

Oh yes, it will change the way your model train by a lot.

2 Likes

The notebooks from part 1 are here: https://github.com/fastai/course-v3/tree/master/nbs/dl1

3 Likes

in what way? faster convergence over batch norm?

1 Like

Why do we want to keep variance =1 all the time? Is it because of the gradient explosion/vanishing problem?

2 Likes

Itā€™s only a definition

Thanks!

What is the link I found thenā€¦

how does your computer explode when taking std deviation and mean again and again?

Jeremy is explaining it now.

I believe it could even be like convergence vs. not convergence at all. (If you have networks with many layers, like it is discussed in the papers shared during previous lesson).

1 Like

Yup, that is so

No itā€™s multiplying a vector by the same matrix again and again.

this feels important to the discussion but I think I missed what jeremy was saying here.

Is there any reason not to initialize each layer and directly scale the weights/biases to achieve 0 mean/1 stdev based on the observed statistics, rather than pre-solving for particular activations/structures?

This isnā€™t really a ā€˜little differenceā€™. Activations/Initializations are crucial parts of the network and probably a detail the biggest % of realtively new users never adjusts default values.

Good job!

is it sqrt(5) or sqrt(3) ?

2 Likes

Thatā€™s what BatchNorm does.

5 Likes

Link to the Twitter thread that Jeremy had mentioned

9 Likes

BatchNorm isnā€™t about init, though, itā€™s applied during net operation, right?