The magic of gradient descent

@jeremy you’ve tried to emphasise a couple of times how CNNs use simple maths to do cool things, that the magic is all in gradient descent, but you’ve also mentioned some folks are still a little befuddled.

In my head the whole thing works because we’re taking tiny jumps down the gradient path, but we’re doing a lot of those individual jumps. So for me the magic is that we can do so many jumps/computations in such a short period of time - optimising for millions of parameters in seconds.

Maybe if there was a way to visualise how many individual parameter updates are happening as we train a model, that might help people to grasp what the “magic” of DNNs is? Again my understanding isn’t that the foundational maths is crazy complex (although I’m sure the implementation is), but the fact that we can do it so quickly is what makes it work.

That’s an interesting perspective. You may be on to something :slight_smile:

(Although I should add that the implementation, at least in software, isn’t that complex either. It’s just multiplying things together and adding them up! The hardware is pretty impressive though…)