Please use this thread to discuss lesson 9. Since this is Part 2, feel free to ask more advanced or slightly tangential questions - although if your question is not related to the lesson much at all, please use a different topic. Note that this is a forum wiki thread, so you all can edit this post to add/change/organize info to help make it better!
Thread for general chit chat (we won’t be monitoring this).
- in nb
02b_initializing.ipynba few minor corrections were made where a variance and std were mixed up. Thanks to Aman Madaan for pointing those out.
03_minibatch_training.ipynb, there is a small error in the
step()method should be:
def step(self): with torch.no_grad(): for p in self.params: p -= p.grad * self.lr
def step(self): with torch.no_grad(): for p in self.params: p -= p.grad * lr
self in the learning rate update formula – the method still works because
lr was declared as a global variable earlier.)
Things mentioned in the lesson
- Self-Normalizing Neural Networks (SELU)
- Exact solutions to the nonlinear dynamics of learning in deep linear neural networks (orthogonal initialization)
- All you need is a good init
continued from last time:
- Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification– 2015 paper that won ImageNet, and introduced ResNet and Kaiming Initialization.
- Understanding the difficulty of training deep feedforward neural networks– paper that introduced Xavier initialization
- Fixup Initialization: Residual Learning Without Normalization – paper highlighting importance of normalisation - training 10,000 layer network without regularisation
Other helpful resources
- Sylvain’s talk, An Infinitely Customizable Training Loop (from the NYC PyTorch meetup) and the slides that go with it
- Why do we need to set the gradients manually to zero in pytorch?
- Tensorboard Integration Thread
- What is torch.nn really?
- Blog post explaining decorators
- Primer on Python Decorators
- Introduction to Mixed Precision Training, Benchmarks using fastai
- Explanation and derivation of LogSumExp
- Blog post about callbacks in fastai #1
- Blog post about callbacks in fastai #2
- Blog post about weight initialization