Lesson 4 - Bias

Hi! I’m currently at Lesson four of the Deep Learning Course for Coders, I’ve taken every course I needed to understand Stochastic Gradient Descent, and… I do!
But I have a small problem, I do not understand why we have to add a bias in order to get our parameters.
I know that we add a bias so that if the input is zero then the output won’t be zero, but why is it a problem, and why does it make our model fit better?
If we add bias to those values, then the output that would’ve been zero will basically become the bias.
So, we’re basically at the same place.
I don’t understand the concept itself, and couldn’t find any source that would go into details(on the level I’m currently at, at the beginning of Lesson 4)
Thank you!

Correct. The values that would have been zero will be the bias. But only for the first epoch. Those values will be updated using SGD and the gradient to a different value and when going through the next pass, they won’t be the bias.

This is better than, the values been zero, then the gradient calculation and update not change the value, so it will be zero for the next pass and so on (It will be a dead neuron)

Also you should go through this discussion:

2 Likes

Thank you! I think I understand!
So the thing is… is it possible that one white pixel is more important than another white pixel?(So basically is it possible that one white pixel has a bigger weight than another? )
Because the whole bias thing would make sense this way, but… still… why would two same pixel have bigger weights?(Again, I’m talking about fully white pixels(0) )

I don’t understand your question completely but The Neural Network will decide which pixels are most important and those that are not by optimizing the parameters using the Optimizer (SGD in this case)

1 Like

Since you’re using the word “pixel”, I’m assuming you’re referring to computer vision and convolutions. In such a case, bias is applied channel-wise, so there’s no individual bias for the pixels.

However, for a simple fully-connected layer where the input is an n-dimensional vector (tabular data, for instance), yes, an input of zero can have a different value after adding the bias than another input of zero because those two inputs aren’t always zero and so their biases are going to be updated differently.

For example, suppose we have two features, the temperature in Beijing and the temperature in New York. More often than not, they have nonzero values, so they’re not going to have equal biases due to the fact that they’re updated differently (they’ll end up at, say, two and three respectively). Thus, when the temperature in both cities is zero, after applying the biases, Beijing’s temperature will give two, and New York’s three.

Please do let me know if more clarification was needed!

1 Like

I think I understand the concept now, thank you for your comprehensive clarification!

No problem!

1 Like