Super impressed with the quality of the information in the MOOC. Big thanks to Jeremy and Rachel.

My question is about the explanation of a simple, fully-connected neural net as described by Jeremy in “Lesson 4: Deep Learning 2018” at time 45:15 see: https://youtu.be/gbceqO8PpBg?t=46m15s.

As an example, Jeremy takes a vector of 20 continuous variables and does a matrix multiply by a 20x100 matrix. He says this is what a matrix multiplication does (agreed) and this is what a linear layer does in Deep Learning. I get that the matrix holds the weights, but what happened to the bias values in the example?

It’s just an illustrative example of the main picture. Biases can be added after the multiplication.

Alternatively, you can treat biases as an additional column in the weight matrix. To do that, concatenate the weight matrix and the bias vector (the biases should be located in the additional last column). Then, extend the inputs vector with the fixed value “1” (that is, input vector length increases by 1, and the last element is always equal to 1). If you work out the matrix-vector multiplication, you’ll see that [A; b] \times [x;1] = A \times x + b.

Also, you might not need the bias component in some cases. Nowadays, people frequently use Batch Normalization layers immediately after the linear (or other) layers. BatchNorm does some tricks to speed up learning, and also learns a bias component, so it’s not necessary to have consecutive 2 biases that sum up. In those cases, people just drop the bias from linear layers.