Jeremy’s Breadcrumb Trails from Lesson 8
Don’t be overwhelmed… just sniff out a few tantalizing trails and follow your nose!
- Math notation: List of mathematical symbols - Simple English Wikipedia, the free encyclopedia
- Get greek letter for, and latex version of a symbol that you draw: Detexify LaTeX handwritten symbol recognition
- “Python fire is a library for automatically generating command line interfaces (CLIs) from absolutely any Python object.” GitHub - google/python-fire: Python Fire is a library for automatically generating command line interfaces (CLIs) from absolutely any Python object.
- Frobenius norm: the square root of the sum of the squares of the elements in the matrix. Frobenius Norm -- from Wolfram MathWorld
- broadcasting rules Broadcasting semantics — PyTorch 2.4 documentation
- Einstein summation convention: Tim Rocktäschel
- Halide, a language for fast, portable computation on images and tensors:
http://halide-lang.org/ - Polyhedral compilation: https://polyhedral.info/
- Chris Lattner Chris Lattner's Homepage (creator of Swift)
- “It’s all about initialization” 10,000 layer deep neural net with no normalization layers! “Fixup Initialization” [1901.09321] Fixup Initialization: Residual Learning Without Normalization
- Homework: read at least section 2.2 of this paper: “Delving Deep into Rectifiers” [1502.01852] Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification in which Kaiming He and his collaborators invent (1) Resnet, (2) “He initialization” method (great improvement over Glorot / Xavier initialization), and (3) the PRelu ([Parameterized Relu) activation function.
- Read the Glorot and Bengio initialization paper from 2010. “Understanding the difficulty of training deep feedforward neural networks”, by Xavier Glorot and Yoshua Bengio http://proceedings.mlr.press/v9/glorot10a/glorot10a.pdf Jeremy said we will be implementing much of it in this course.
- Look into torch.nn.Conv2d() Why is there a multiplier of sqrt(5) in the initialization? Jeremy pointed out that it is not documented and he thinks it’s incorrect.
- “The matrix calculus you need for deep learning” By Terence Parr and Jeremy Howard: The Matrix Calculus You Need For Deep Learning
- Jeremy has proposed a further improvement over “He initialization” in this lecture! He suggested testing it out… See the thread Shifted ReLU (-0.5)