Lesson 9 Discussion & Wiki (2019)

rachel · March 26, 2019, 1:11am

Please use this thread to discuss lesson 9. Since this is Part 2, feel free to ask more advanced or slightly tangential questions - although if your question is not related to the lesson much at all, please use a different topic. Note that this is a forum wiki thread, so you all can edit this post to add/change/organize info to help make it better!

Thread for general chit chat (we won’t be monitoring this).

Lesson resources

Errata

in nb 02b_initializing.ipynb a few minor corrections were made where a variance and std were mixed up. Thanks to Aman Madaan for pointing those out.
In 03_minibatch_training.ipynb, there is a small error in the Optimizer() class. The step() method should be:

    def step(self):
        with torch.no_grad():
            for p in self.params: p -= p.grad * self.lr

instead of

    def step(self):
        with torch.no_grad():
            for p in self.params: p -= p.grad * lr

(missing self in the learning rate update formula – the method still works because lr was declared as a global variable earlier.)

Papers

Self-Normalizing Neural Networks (SELU)
Exact solutions to the nonlinear dynamics of learning in deep linear neural networks (orthogonal initialization)
All you need is a good init
Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification– 2015 paper that won ImageNet, and introduced ResNet and Kaiming Initialization.
Understanding the difficulty of training deep feedforward neural networks– paper that introduced Xavier initialization
Fixup Initialization: Residual Learning Without Normalization – paper highlighting importance of normalisation - training 10,000 layer network without regularisation

Notes and other resources

Talks and blog posts

Gabriel_Syme · March 26, 2019, 1:33am

This lecture deserves a How to Train your [Dragon] Model poster, someone more artistic than me please make it

wgpubs · March 26, 2019, 1:39am

What is the refactoring process by which these nb_XX.py python files (and their code which is sometimes duplicated) get turned into things like fastai.text.data and so forth?

sgugger · March 26, 2019, 1:39am

The refactoring happens in the notebooks. We only turned them in a library when they looked nice and cosy.

wgpubs · March 26, 2019, 1:40am

Is that process going to be covered as well?

devforfu · March 26, 2019, 1:41am

I believe it is kind of covered right now =)

sgugger · March 26, 2019, 1:41am

It’s what Jeremy does in each of those notebooks. Later in this lesson, you’ll see a training loop and CallbackHandler even better than what we have inside the library.

username_not_found · March 26, 2019, 1:43am

I don’t understand: would’nt it make more sense for the “leak” value to be the -Slope, rather than slope?

neuradai · March 26, 2019, 1:44am

The slope of the left-hand side of the leaky ReLU is still positive.

sgugger · March 26, 2019, 1:46am

Yes I didn’t catch Jeremy saying minus, but if he said minus, it was just a mistake. The slope on the negative size is still positive.

aamir7117 · March 26, 2019, 1:47am

which notebook did Jeremy say we’re following? 02a…?

sgugger · March 26, 2019, 1:47am

Yes, it’s the one.

alando · March 26, 2019, 1:47am

What is gain for again?

sgugger · March 26, 2019, 1:48am

It’s the multiplier you have to use to take into account the slope of your leaky really when initializing your layers.

devforfu · March 26, 2019, 1:49am

Here is the notebook about sqrt(5) discussed in the lecture.

alando · March 26, 2019, 1:49am

according to Kaiming initializations?

sgugger · March 26, 2019, 1:49am

Yes, it’s the formula in the paper we mentioned last week.

alando · March 26, 2019, 1:50am

great, thanks for clarifying

nok · March 26, 2019, 1:51am

Why uniform instead of normal distribution here?

sgugger · March 26, 2019, 1:52am

That’s the default pytorch init.