 # Why don't we add the bias to each pixel but rather to the entire image?

Hello, as the title says, I was reading through the part 1of the course and i’ve reached the MNIST notebook. The only thing I cant quite grasp is why the bias is added to the weighted sum of each picture and not each pixel.

An example is:

``````(train_x*weights.T).sum() + bias
``````

Why is this correct and not this:

``````(train_x*weights.T + bias).sum()
``````

Lastly, I dont get how the dot product helps us calculate the weighted sum better than a for loop.

Also, I would like to get some further clarification as to why we choose the y=a*x + b function to predict images. Is that something standard ? Can we use others ?

Hi elemos,

try to run the code on Google Colab or Kaggle. And inspect the shape of train_x, weights.T and bias. Can you figure out, why `(train_x*weights.T + bias).sum()` won’t work?

The answer is in the script

While we could use a Python for loop to calculate the prediction for each image, that would be very slow. Because Python loops don’t run on the GPU, and because Python is a slow language for loops in general, we need to represent as much of the computation in a model as possible using higher-level functions. In this case, […] matrix multiplication

This is how a standard linear layer is computed. You can see later in the notebook, that nn.Linear (from PyTorch) is used instead of linear1. There are many different PyTorch models/layers that can be used. Convolutions for instance are used in Computer Vision a lot. But you’ll get there as soon as you dive deeper into the book.

1 Like

Hello and thanks for the answer, regarding the first part i realised i meant to write

``````((train_x*weights.T) + bias).sum()
``````

so that the bias gets added to each weighted pixel.

I managed to figure out the second part about the matrix multiplication after I understood the shape of the tensor and how matrix multiplication works.

As for the third part I get that its a standard thing that was chosen for this example, however I cant quite grasp how summing the weighted pixels gives us a prediction. Meaning, if we multiply each pixel with their respective weight, and then sum the result from all the pixels, how does this constitute a prediction ? Thats what I really had in mind, i just phrased that really wrong.

Thank you for your answers on the rest however. Hello again The bias is created as a single number `bias = init_params(1)`… Hm first I thought this would not work because of the shapes not matching. But maybe it would still work because of broadcasting. But the formula would look different then. The `w*x` in the equation `y=w*x+b` is a matrix multiplication. A matrix multiplication in Python is done with the @ operator as seen in `def linear1(xb): return xb@weights + bias`. The python code `(train_x*weights.T).sum()` is mimicking what habens if you use the matrix multiply `train_x@weights`. But this will only works for the first image train_x. If you wan’t to do it for the whole dataset you would have to use a for loop. But with matrix multiplication you can just `train_x@weights` and you will get a prediction for all images.

Try to reread the chapter and rewatch the videos… Hopefully you will find sense in my words then `(train_x*weights.T).sum() + bias` gives you a number. Which I understand is far from being a prediction at first glance. What does it mean?! The example is a binary classification example: is it a 3, yes or no?. So how does 23.4543 help you with that?! It doesn’t (really). Thats why sigmoid is introduced, to push this number between 0 (false) and 1 (true).