Lesson 9 Discussion & Wiki (2019)

Please use this thread to discuss lesson 9. Since this is Part 2, feel free to ask more advanced or slightly tangential questions - although if your question is not related to the lesson much at all, please use a different topic. Note that this is a forum wiki thread, so you all can edit this post to add/change/organize info to help make it better!

Thread for general chit chat (we won’t be monitoring this).

Lesson resources


  • in nb 02b_initializing.ipynb a few minor corrections were made where a variance and std were mixed up. Thanks to Aman Madaan for pointing those out.
  • In 03_minibatch_training.ipynb, there is a small error in the Optimizer() class. The step() method should be:
    def step(self):
        with torch.no_grad():
            for p in self.params: p -= p.grad * self.lr

instead of

    def step(self):
        with torch.no_grad():
            for p in self.params: p -= p.grad * lr

(missing self in the learning rate update formula – the method still works because lr was declared as a global variable earlier.)


Notes and other resources

Talks and blog posts


This lecture deserves a How to Train your [Dragon] Model poster, someone more artistic than me please make it :slight_smile:


What is the refactoring process by which these nb_XX.py python files (and their code which is sometimes duplicated) get turned into things like fastai.text.data and so forth?

1 Like

The refactoring happens in the notebooks. We only turned them in a library when they looked nice and cosy.

1 Like

Is that process going to be covered as well?

I believe it is kind of covered right now =)

It’s what Jeremy does in each of those notebooks. Later in this lesson, you’ll see a training loop and CallbackHandler even better than what we have inside the library.


I don’t understand: would’nt it make more sense for the “leak” value to be the -Slope, rather than slope?

The slope of the left-hand side of the leaky ReLU is still positive.


Yes I didn’t catch Jeremy saying minus, but if he said minus, it was just a mistake. The slope on the negative size is still positive.

which notebook did Jeremy say we’re following? 02a…?

Yes, it’s the one.

What is gain for again?


It’s the multiplier you have to use to take into account the slope of your leaky really when initializing your layers.


Here is the notebook about sqrt(5) discussed in the lecture.

1 Like

according to Kaiming initializations?

1 Like

Yes, it’s the formula in the paper we mentioned last week.

1 Like

great, thanks for clarifying :+1:

Why uniform instead of normal distribution here?


That’s the default pytorch init.

1 Like