Why does one use a labeled training image to learn?

Finally found a way to post, so here it is folks! Need your help with a nagging question I have about training a model. It’s my first take on deep learning, so my question may come surprising for those who already know the answer - please help otherwise I am lost on the intuition on how a model learns!

MNIST basics - I have both training - train_x, train_y and validation data set - valid_x, valid_y.
I’ve represented each data set as a stacked tensor with each 28 x 28 image spread out horizontally as 768 elements and ready for matrix multiplications.

So far so good and every step makes :blush: sense.

Next, I initialize the weights to random values and now onto the very first step to begin (machine) learning. I am confused about this very first step.

(train_x[0]*weights.T).sum() + bias

The first image (train_x[0]) is multiplied with weights w, bias b is added which gives a :roll_eyes: prediction. Why can I say this gives prediction? train_x[0] is already an image with a known label train_y[0]. It would’ve made sense if I had a blank image - 768 pixels initialized to 1.0 - as starting point. Then multiplying weights, doing the gradient descent, stepping weights, and predicting if my new version of image looks like train_x[0] (using MSE) would say that it learnt (the weights) how to label it as train_y[0].

Once I multiply the randomized weights, do the gradient descent, am I not just distorting the image first and then trying to piece it back.

Either I am missing some fundamental reasoning behind training set or there is a simple reason that makes sense - been scratching my head over this :woozy_face:

From the beginning. We have things (like pictures) and labels. Lots of them.
We take the thing and multiple by a matrix and add a matrix to give a suggested label. We compare with the actual label and the difference is the loss. We repeat with other things. This is a batch. We summarise and change the two maticies. We repeat lots of times using all the data n times (n = epoch). Eventually we take a completely never seen before different set of things and labels and using the matrices we predict the label. This gives us the validation sucess.
The things never change unless you produce other things by for example vision rotation.
Regards Conwyn.

@Conwyn thanks :+1: for your insight. I think I see what you mean. Plain english helps a ton to understand this a little :pleading_face: more clearer.

To paraphrase (for my own internalization of the concept, intentionally ignoring bias for now) - what does one mean by when one says a model learns? It is basically computing the weights that the model has arrived at that are termed as learning. These weights when multiplied by the thing it has already seen will perfectly match the label attached to the thing (because, well, the model saw the thing during training).

In my question, I was taken in by the words that very first image train_x[0] multiplied by weights gives a prediction. As a matter of fact, all other training images have to be seen once so that multiplying train_x[0] with those weights now would cause it to give a higher number for that label over other labels.

But, for things that the model has not yet seen, multiplying valid_x[0] with weights should give back a label that has higher number amongst the labels it knows - the prediction. Am I understanding it correct? (For simplicity, I’ll stick to single label classification only; I know not every thing can be one or the other so some other variation of multiple labels should be there)

I gather that arriving at these weights (by tweaking them a little, deciding whether to increase or decrease each one based on if overall loss goes up or down) is why one had to first begin multiplying train_x[0] with some starting weights.

You are indeed :face_with_monocle: spot on when you said the things never change unless one chooses to process the input in some way intentionally. @jeremy I was confused that that was what was going on because I thought we were trying to make the more learn how to create a thing rather than guess it’s that thing.

But it turns out, it’s focused on guessing; so yeah :upside_down_face: making sense.