Please use this thread to discuss lesson 9. Since this is Part 2, feel free to ask more advanced or slightly tangential questions - although if your question is not related to the lesson much at all, please use a different topic. Note that this is a forum wiki thread, so you all can edit this post to add/change/organize info to help make it better!
Thread for general chit chat (we won’t be monitoring this).
Lesson resources
Errata
- in nb
02b_initializing.ipynb
a few minor corrections were made where a variance and std were mixed up. Thanks to Aman Madaan for pointing those out. - In
03_minibatch_training.ipynb
, there is a small error in theOptimizer()
class. Thestep()
method should be:
def step(self):
with torch.no_grad():
for p in self.params: p -= p.grad * self.lr
instead of
def step(self):
with torch.no_grad():
for p in self.params: p -= p.grad * lr
(missing self
in the learning rate update formula – the method still works because lr
was declared as a global variable earlier.)
Papers
- Self-Normalizing Neural Networks (SELU)
- Exact solutions to the nonlinear dynamics of learning in deep linear neural networks (orthogonal initialization)
- All you need is a good init
- Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification– 2015 paper that won ImageNet, and introduced ResNet and Kaiming Initialization.
- Understanding the difficulty of training deep feedforward neural networks– paper that introduced Xavier initialization
- Fixup Initialization: Residual Learning Without Normalization – paper highlighting importance of normalisation - training 10,000 layer network without regularisation
Notes and other resources
Talks and blog posts
- Sylvain’s talk, An Infinitely Customizable Training Loop (from the NYC PyTorch meetup) and the slides that go with it
- Why do we need to set the gradients manually to zero in pytorch?
- Tensorboard Integration Thread
- What is torch.nn really?
- Blog post explaining decorators
- Primer on Python Decorators
- Introduction to Mixed Precision Training, Benchmarks using fastai
- Explanation and derivation of LogSumExp
- Blog post about callbacks in fastai #1
- Blog post about callbacks in fastai #2
- Blog post about weight initialization