(This is a wiki post - please edit!)
Errata
- The layer and instance norm code in the video use
std
instead ofvar
. This is fixed in the notebook - I said
binomial
when I meantbinary
. Also shown incorrectly in the XL spreadsheet (now fixed). - The variance of a batch of one calculates to 0, not infinity. (With some technical exceptions.) Therefore BatchNorm would attempt to scale the filter to infinity.
Lesson resources
Papers mentioned this week
- Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
- Layer Normalization
- Instance Normalization: The Missing Ingredient for Fast Stylization
- Group Normalization
Additional Resources
- Interpreting the colorful histograms used in this lesson
- Lesson notebooks annotated with time point links to the lesson video on Youtube
Other relevant papers
Papers for next week
- All you need is a good init
- mixup: Beyond Empirical Risk Minimization
- Rethinking the Inception Architecture for Computer Vision (label smoothing is in part 7)
- Adam: A Method for Stochastic Optimization
- Decoupled Weight Decay Regularization (AdamW)
- Bag of Tricks for Image Classification with Convolutional Neural Networks
Lesson notes and other resources
- Lesson 10 Notes, by @Lankinen
- github repo for the Deep Learning From the Foundations Study Group. A work in progress. So far, contains annotated and slightly reworked versions of notebooks 04, 05, 05a, 05b, and 06 along with slides about topics in Lesson 10.
- New learning rate schedule based on the beta probability distribution function!
- How Convolutions Work - A Mini-Tutorial