Hi everyone,
I’m really enjoying the course so far, and my understanding is slowly increasing with time, but there’s a (huge?) conceptual ‘thing’ that I’m not quite getting, and I don’t even know how to Google the issue, so wanted to ask here.
If this is the wrong place to ask this question, please feel free to delete. Apologies in advance.
So:
In lesson 2, Jeremy demonstrates a simple linear function (y = ax + b), and it’s easy to understand that with one value (x temperature), the other value (y ice cream sales) is predicted by tweaking the coefficients.
But, I can’t really conceptualise how it applies to more complex things like classifying images.
The neural network
What is literally happening? Is the neural network just conceptual, rather than there actually being some digitised form of a neural network? Or does it ‘exist’?
Conceptual .vs. Real
The relationship between what is conceptually occurring (ie, for the animal breed categoriser: “The network is learning to spot angles, corners, then shapes, then things like eyes and finally breeds”) and what is really/literally occurring is confusing me. ie, What we mean when we say that it’s “finding corners”.
Is the process written down anywhere in a digestible form?
By this I mean: I’m not clear about how it begins, ends and what’s really happening in between. I’m not looking for anybody to spoon-feed this to me, rather just a pointer like “Go here, read this”.
My fudged understanding / guesstimate(?), which is almost certainly very wrong, is that:
- The image is fed into the neural network’s inputs, pixel by pixel (so every pixel is a separate input)
- The pixels are surely(?) grouped, so that they form a context(?), otherwise they’re just individual pixels from which no meaning can be derived
- ???
- Probabilities are calculated for the various labels applying to the image, and argmax is used for the most likely outcome(s)
I think what I’m not ‘getting’ is what is really saved/known at each layer in the network, and what is even ‘passed’ to each layer of the network.
If one of the earlier layers is ‘finding diagonal colours’ and another is ‘spotting corners’, what does that really mean? Is there a hidden internal process where ‘labels’ are created (like ‘corner’) and a group of pixels are read-in together and a pattern is spotted and ‘corner’ is ‘activated’ as a match, and then this is passed on to another layer?
If I’m able to understand what’s happening in more depth, I think the relationship between the math(s) and how I’m supposed to envisage what’s conceptually going on will make more sense.
Thanks, and apologies for such an elementary question.