Things Jeremy says to do (Part 2)

jcatanza · March 21, 2019, 4:43pm

Jeremy’s Breadcrumb Trails from Lesson 8

Don’t be overwhelmed… just sniff out a few tantalizing trails and follow your nose!

Math notation: List of mathematical symbols - Simple English Wikipedia, the free encyclopedia
Get greek letter for, and latex version of a symbol that you draw: Detexify LaTeX handwritten symbol recognition
“Python fire is a library for automatically generating command line interfaces (CLIs) from absolutely any Python object.” GitHub - google/python-fire: Python Fire is a library for automatically generating command line interfaces (CLIs) from absolutely any Python object.
Frobenius norm: the square root of the sum of the squares of the elements in the matrix. Frobenius Norm -- from Wolfram MathWorld
broadcasting rules Broadcasting semantics — PyTorch 2.4 documentation
Einstein summation convention: Tim Rocktäschel
Halide, a language for fast, portable computation on images and tensors:
http://halide-lang.org/
Polyhedral compilation: https://polyhedral.info/
Chris Lattner Chris Lattner's Homepage (creator of Swift)
“It’s all about initialization” 10,000 layer deep neural net with no normalization layers! “Fixup Initialization” [1901.09321] Fixup Initialization: Residual Learning Without Normalization
Homework: read at least section 2.2 of this paper: “Delving Deep into Rectifiers” [1502.01852] Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification in which Kaiming He and his collaborators invent (1) Resnet, (2) “He initialization” method (great improvement over Glorot / Xavier initialization), and (3) the PRelu ([Parameterized Relu) activation function.
Read the Glorot and Bengio initialization paper from 2010. “Understanding the difficulty of training deep feedforward neural networks”, by Xavier Glorot and Yoshua Bengio http://proceedings.mlr.press/v9/glorot10a/glorot10a.pdf Jeremy said we will be implementing much of it in this course.
Look into torch.nn.Conv2d() Why is there a multiplier of sqrt(5) in the initialization? Jeremy pointed out that it is not documented and he thinks it’s incorrect.
“The matrix calculus you need for deep learning” By Terence Parr and Jeremy Howard: The Matrix Calculus You Need For Deep Learning
Jeremy has proposed a further improvement over “He initialization” in this lecture! He suggested testing it out… See the thread Shifted ReLU (-0.5)