My friend @parrt and I (mainly Terence!) have written this new article: The Matrix Calculus You Need For Deep Learning . We’re planning to make it public tomorrow. For anyone interested in checking it out, we’d love to get your feedback before then.
Are there any bits that don’t make sense to you? Is the context/motivation clear? Does it render OK on your device? Does the PDF version linked there look OK on your PDF reader (currently it has known problems on Mac preview.app)?
(Of course, please don’t mention this on social media etc before we officially release it - I’ll post here when we do.)
Hi Jeremy, thank you very much for the nice work and sharing it.
I think some derivative formulas are hard to read without zooming it couple of times. (may be I’m tired ) Also, you may use equation numbering for some important equations.
What device and OS are you using? Can you press Ctrl + (or Cmd +) to make the text bigger? Can you tell me a couple of equations you’re finding a little small to read?
Thanks for this great collaboration @jeremy and @parrt. I will definitely finish reading it this week. But for reviewing I glanced at every page just now in Chrome browser.
It looks great, clean and well organized. Only in some tabular spaces math notations looks crowded maybe spacing between rows can relief the reader a bit (for PDF in Chrome). I would personally prefer the html version.
Sorry for the obvious noob question and thank you for this post.
I don’t understand the following notation
Specifically, why is there an abs(x) on top of the summation. From my understanding the rectifier passes only positive values of the affine function, computed over all x_i. So I don’t understand what the abs(x) is supposed to signify.
Very understandable question! That notation |x| refers to the count of items in x in this context. There’s a notation reference at the end of the paper BTW
Hi Jeremy, reviewed on Android device, Note 8 and it rendered fine, no issues at all.
I can completely resonate with the need for a paper such as this as I have learnt more through ‘doing’ the code in your course as opposed to learning the underlying math as you mention in the paper. So it’s good to see the mathematical concepts, although some parts are new and some complicated for me, and will require a number of repeat reads as mentioned in the paper.
I haven’t come across a paper that talks about the mathematical notation and then links that to what the code looks like, not sure if that is entirely possible or needed but it would be great to reference mathematical notation to code with the aim of improving comprehension, maybe a side note in the resources section.
the matrix calculus and bit leading into gradient descent looks good (I ended up skimming it though). before that I have a few comments:
errors:
in table under review scalar derivatives rule, quotient rule. I don’t follow your equation. The example isn’t right. d/dx(x/3) is 1/3. you show an x^2 equation… the quotient rule should be (fprime.g -f.gprime)/g^2. this gives (2x3x-3x^2)/9x^2 = 1/3. you’re not using it elsewhere, so maybe just remove it?
<change 1 to 2>: 1st equation under Matrix Calculus :… 2 x 1 = 1
ticky touchwood or cosmetic:
in table under review scalar derivatives rule, product rule, I’d show an explicit 1, to show d/dx(x), i.e. (x(^2)1+x2x
<insert “at”> which you can find Khan Academy differential calculus course.
in the jacobian section the 3 delta/deltax_i . f(x) matrices spacings make it hard to discern columns unless you knew they were there (it becomes obvious when they’re zeroed, so maybe doesnt matter)
I reviewed it on Chrome, Mozilla both on PC and Android. It’s great . No issues.
Also the writing is wonderful and quite informative .
Thanks @jeremy and @parrt for this great article .
scalar derivative rules are a bit crammed. some spacing around would make it easier to read
have heard of “weight vector”, but what is an “edge” weight vector?
Beginners are often unsure of how to denote the dimensions of a matrix. Can it include that m x n is num of rows x number of columns?
is it possible to put this line earlier in document when bold notation is used: "Lowercase letters in bold font such as x are vectors and those in italics font like x are scalars"
explain upside down triangle is called Del (capital delta)