Preview release: "Matrix Calculus for Deep Learning"

My friend @parrt and I (mainly Terence!) have written this new article: The Matrix Calculus You Need For Deep Learning . We’re planning to make it public tomorrow. For anyone interested in checking it out, we’d love to get your feedback before then.

Are there any bits that don’t make sense to you? Is the context/motivation clear? Does it render OK on your device? Does the PDF version linked there look OK on your PDF reader (currently it has known problems on Mac

(Of course, please don’t mention this on social media etc before we officially release it - I’ll post here when we do.)


Hi Jeremy, thank you very much for the nice work and sharing it.

I think some derivative formulas are hard to read without zooming it couple of times. (may be I’m tired :grinning:) Also, you may use equation numbering for some important equations.

What device and OS are you using? Can you press Ctrl + (or Cmd +) to make the text bigger? Can you tell me a couple of equations you’re finding a little small to read?

Thanks for this great collaboration @jeremy and @parrt. I will definitely finish reading it this week. But for reviewing I glanced at every page just now in Chrome browser.

It looks great, clean and well organized. Only in some tabular spaces math notations looks crowded maybe spacing between rows can relief the reader a bit (for PDF in Chrome). I would personally prefer the html version.

For example:

When should I share the links with my connections ?

Thanks Again !

I’ll post here when we release it - thanks @kcturgutlu!

too tired to dig deeply but on the first blush this looks great - maybe the font is a bit pale to knit pick if i may…

Sorry for the obvious noob question and thank you for this post.

I don’t understand the following notation


Specifically, why is there an abs(x) on top of the summation. From my understanding the rectifier passes only positive values of the affine function, computed over all x_i. So I don’t understand what the abs(x) is supposed to signify.

1 Like

Very understandable question! That notation |x| refers to the count of items in x in this context. There’s a notation reference at the end of the paper BTW :slight_smile:

Of course, |x| has multiple different meanings, and there are multiple different ways of expressing a count (

Because math. sigh

1 Like

It can’t be made further simpler than this…

Thanks a lot…

There’s one more PDF which (should be read) after reading this…(had read it so just informing)

Convolution Mathematics.pdf (858.0 KB)


Both go hand in hand…


Hi Jeremy, reviewed on Android device, Note 8 and it rendered fine, no issues at all.
I can completely resonate with the need for a paper such as this as I have learnt more through ‘doing’ the code in your course as opposed to learning the underlying math as you mention in the paper. So it’s good to see the mathematical concepts, although some parts are new and some complicated for me, and will require a number of repeat reads as mentioned in the paper.
I haven’t come across a paper that talks about the mathematical notation and then links that to what the code looks like, not sure if that is entirely possible or needed but it would be great to reference mathematical notation to code with the aim of improving comprehension, maybe a side note in the resources section.

Thank you for sharing this.

This is amazing beyond belief. It made it into Mendeley on my phone and it looks beautiful.

I seem to overdo it when I express my appreciation of things online. And then I happen to come across as a not very serious person.

But just wanted to say. OMG. This is amazing. Many moments will be spent perusing this in great detail :slight_smile:

1 Like

the matrix calculus and bit leading into gradient descent looks good (I ended up skimming it though). before that I have a few comments:

in table under review scalar derivatives rule, quotient rule. I don’t follow your equation. The example isn’t right. d/dx(x/3) is 1/3. you show an x^2 equation… the quotient rule should be (fprime.g -f.gprime)/g^2. this gives (2x3x-3x^2)/9x^2 = 1/3. you’re not using it elsewhere, so maybe just remove it?
<change 1 to 2>: 1st equation under Matrix Calculus :… 2 x 1 = 1

ticky touchwood or cosmetic:
in table under review scalar derivatives rule, product rule, I’d show an explicit 1, to show d/dx(x), i.e. (x(^2)1+x2x
<insert “at”> which you can find Khan Academy differential calculus course.
in the jacobian section the 3 delta/deltax_i . f(x) matrices spacings make it hard to discern columns unless you knew they were there (it becomes obvious when they’re zeroed, so maybe doesnt matter)

is |x| better than ms and ns? (I’ve always stuck with the latter as you see them in the bottom right of the matrices)

Yep I too feel that there’s an error in the table

We need to use u/v rule there…

I reviewed it on Chrome, Mozilla both on PC and Android. It’s great . No issues.
Also the writing is wonderful and quite informative .
Thanks @jeremy and @parrt for this great article .

few minor comments on matrix calculus:

  • scalar derivative rules are a bit crammed. some spacing around would make it easier to read
  • have heard of “weight vector”, but what is an “edge” weight vector?
  • Beginners are often unsure of how to denote the dimensions of a matrix. Can it include that m x n is num of rows x number of columns?
  • is it possible to put this line earlier in document when bold notation is used: "Lowercase letters in bold font such as x are vectors and those in italics font like x are scalars"
  • explain upside down triangle is called Del (capital delta)
1 Like

Thanks @jeremy! I reviewed it in chrome on MacBook. Fairly clear and insightful illustration!

hiya. Is there an external link for that? I could link it in.

thanks! fixed. pushing as I go…

I think there is a typo in the matrix just before “Welcome to matrix calculus!”.
The derivate of g(x,y) with respect to x is 2 not 1.