(This is a wiki post - please edit!)

## Errata

- The layer and instance norm code in the video use
`std`

instead of`var`

. This is fixed in the notebook - I said
`binomial`

when I meant`binary`

. Also shown incorrectly in the XL spreadsheet (now fixed). - The variance of a batch of one calculates to 0, not infinity. (With some technical exceptions.) Therefore BatchNorm would attempt to scale the filter to infinity.

## Lesson resources

## Papers mentioned this week

- Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
- Layer Normalization
- Instance Normalization: The Missing Ingredient for Fast Stylization
- Group Normalization

## Notes and other resources

- Annotated notebooks for Lessons 8 - 12
- Lesson 10 Notes, by @Lankinen
- New learning rate schedule based on the beta probability distribution function!
- How Convolutions Work - A Mini-Tutorial
- Interpreting the colorful histograms used in this lesson
- Lesson notebooks annotated with time point links to the lesson video on Youtube

## Other relevant papers

## Papers for next week

- All you need is a good init
- mixup: Beyond Empirical Risk Minimization
- Rethinking the Inception Architecture for Computer Vision (label smoothing is in part 7)
- Adam: A Method for Stochastic Optimization
- Decoupled Weight Decay Regularization (AdamW)
- Bag of Tricks for Image Classification with Convolutional Neural Networks