During the initialization step, we use the method we defined, “init_params” to generate the weights as a vector and the bias as a scalar number. Each epoch then uses the gradient and the learning rate to adjust the weights in order to more closely approximate the training labels and make more accurate predictions.

But we don’t adjust the bias in a similar matter. Why not? Is this because the weights are disproportionately more important to the overall outcome than the bias, since they’re higher-order in a Big-O notation sense (i.e. they’re used to multiply x^1, vs. the bias which we can think of as being “multiplied” by 1, aka x^0)?

First, I would like to point out that the weights are updated every *iteration*, not every epoch. One epoch is equal to (length of the dataset)/(batch size) total iterations and represents running over the entire dataset.

Second, the biases *are updated* every single iteration during training. It’s seen in chapter 4 as well. Is there a reason you though the biases are not being updated?

Oh, I see it now:

```
for p in params:
p.data -= p.grad*lr
```

I missed that the bias is included in the params. I think this was a simple oversight, there’s a lot of information to absorb lol. Nevertheless, I learned something about the difference between iterations and epochs. Thank you for clarifying!

2 Likes