(This is a wiki post - please edit!)

## Errata

- The layer and instance norm code in the video use
`std`

instead of`var`

. This is fixed in the notebook - I said
`binomial`

when I meant`binary`

. Also shown incorrectly in the XL spreadsheet (now fixed). - The variance of a batch of one calculates to 0, not infinity. (With some technical exceptions.) Therefore BatchNorm would attempt to scale the filter to infinity.

## Lesson resources

## Papers mentioned this week

- Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
- Layer Normalization
- Instance Normalization: The Missing Ingredient for Fast Stylization
- Group Normalization

## Additional Resources

- Interpreting the colorful histograms used in this lesson
- Lesson notebooks annotated with time point links to the lesson video on Youtube

## Other relevant papers

## Papers for next week

- All you need is a good init
- mixup: Beyond Empirical Risk Minimization
- Rethinking the Inception Architecture for Computer Vision (label smoothing is in part 7)
- Adam: A Method for Stochastic Optimization
- Decoupled Weight Decay Regularization (AdamW)
- Bag of Tricks for Image Classification with Convolutional Neural Networks

## Lesson notes and other resources

- 1 @Lankinen
- 2 github repo for the
**Deep Learning From the Foundations Study Group**contains annotated and slightly reworked versions of notebooks 04, 05, 05a and 05b along with slides about topics in Lesson 10.