Resources (and mindset) that helped me better understand backprop

Hey everyone,

Since it’s relevant to lesson 8, I just wanted to share some thoughts and resources that helped me improve my understanding of backpropagation.

  • Jeremy mentioned Khan Academy in part 8. I’d like to second this.
    Scalar derivative rules and multi-variable derivatives are good. Also for unary trig functions, log, etc. you can go here
    Note: Actually doing some problems by hand was key. Khan academy has a bunch of quiz questions you can try. The benefit is you can check your answer and have some peace of mind that you’re on the right track! You can also use PyTorch to check your answers later on when you get to matrix calculus. More on that further down.

  • A lot of the multi-variable calculus lessons on Khan Academy were taught by Grant Sanderson. Grant has a great channel on YouTube called 3blue1brown where he goes over lots of cool math topics including calculus and linear algebra. Grant puts a lot of effort into helping you visualize mathematical concepts. As an added bonus, he actually has 4 videos on neural nets (fully connected) that ends with a walkthrough of backpropagation.

  • I noticed a lot of explanations of backprop glossed over the matrix-y bits. The matrix calculus write up that Jeremy mentioned was key to closing that gap. For me it was like a Rosetta Stone for understanding backprop.

  • When it comes down to it, you usually won’t be computing gradients manually for every operation. That’s where reverse-mode automatic differentiation comes in (autograd in PyTorch). Like many things in deep learning that’s just a fancy name for something pretty simple. It’s essentially the solution we wound up with by the end of class, though maybe a bit more generalized. I quite like the explanation here and there is also a more coding oriented demo. I’d also recommend going through the PyTorch 60 minute blitz tutorial which has a section on autograd.

  • Another thing I did was implement a very small, fully-connected 2-layer neural net in PyTorch being sure to set requires_grad=True where necessary. I initialized the x, y, weights and biases to some arbitrary 0-9 value. After the set up, I went to the whiteboard and computed the gradient with respect to the weights and biases for a batch size of one. I then checked my answers against the grads of the tensors after calling backward().

  • And lastly, here are some key things I’ve learned reading SIGGRAPH and ML research papers over the years. This is general stuff that applies to diving into any advanced topic and echos some of what Jeremy said in lesson 8.

    • I never soak up all the info from a very mathy article on the first read through. In fact, I’ve come back to papers that I read years ago and discovered I completely missed some key nuance. So I usually skim through the first time then on the second pass I really dig into the gory math. Then I try to implement as much of it in code as I can and finally come back to the paper again to check my understanding. At this point I would look for any reference implementation to check my results.

    • I learned early on to be honest with myself about what I don’t know. Feynman put it best. As soon as I come across some new notation or concept that I don’t understand, I just google it and anything else I need in order to understand whatever it is I need to understand. I used to get stuck in the mindset of “I think I understand this!” or “I should understand this!?” which deluded me into thinking “I do understand this!” and then I’d plow on through. Usually that would result in me throwing up my hands and giving up at some point or missing some key detail and having to go back to square one. Patience is key. Rome wasn’t built in a day.

    • With advanced math it’s easy to avoid actually going to a white board or piece of paper and working through the problem by hand–it can seem scary or tiring just to think about. I have to fight that urge to not try it out. Often times I find things click for me when I work through them on my whiteboard. Often times it’s a struggle, but if you persist, you will be so much better for it.

So I hope this list is helpful. If anything, it was helpful for me to think this through and write it all down. I would be very curious to hear other people’s advice and resources on this topic!


awesome write up, thx

Thanks for your post!

A thing that helped me a lot to get more intuition on backprop (and chain rule), is to think about the computation graph.
The following videos, from Andrew Ng deep learning course, give a very simple and great visual explanation:

1 Like