Preview release: "Matrix Calculus for Deep Learning"

jeremy · January 30, 2018, 1:37am

My friend @parrt and I (mainly Terence!) have written this new article: The Matrix Calculus You Need For Deep Learning . We’re planning to make it public tomorrow. For anyone interested in checking it out, we’d love to get your feedback before then.

Are there any bits that don’t make sense to you? Is the context/motivation clear? Does it render OK on your device? Does the PDF version linked there look OK on your PDF reader (currently it has known problems on Mac preview.app)?

(Of course, please don’t mention this on social media etc before we officially release it - I’ll post here when we do.)

s.s.o · January 30, 2018, 2:02am

Hi Jeremy, thank you very much for the nice work and sharing it.

I think some derivative formulas are hard to read without zooming it couple of times. (may be I’m tired ) Also, you may use equation numbering for some important equations.

jeremy · January 30, 2018, 2:56am

What device and OS are you using? Can you press Ctrl + (or Cmd +) to make the text bigger? Can you tell me a couple of equations you’re finding a little small to read?

kcturgutlu · January 30, 2018, 3:07am

Thanks for this great collaboration @jeremy and @parrt. I will definitely finish reading it this week. But for reviewing I glanced at every page just now in Chrome browser.

It looks great, clean and well organized. Only in some tabular spaces math notations looks crowded maybe spacing between rows can relief the reader a bit (for PDF in Chrome). I would personally prefer the html version.

For example:

When should I share the links with my connections ?

Thanks Again !

jeremy · January 30, 2018, 3:30am

I’ll post here when we release it - thanks @kcturgutlu!

helena · January 30, 2018, 3:55am

too tired to dig deeply but on the first blush this looks great - maybe the font is a bit pale to knit pick if i may…

JayBee · January 30, 2018, 5:37am

Sorry for the obvious noob question and thank you for this post.

I don’t understand the following notation

edu-2018-01-30-11-02-20-252

Specifically, why is there an abs(x) on top of the summation. From my understanding the rectifier passes only positive values of the affine function, computed over all x_i. So I don’t understand what the abs(x) is supposed to signify.

jeremy · January 30, 2018, 6:22am

Very understandable question! That notation |x| refers to the count of items in x in this context. There’s a notation reference at the end of the paper BTW

Of course, |x| has multiple different meanings, and there are multiple different ways of expressing a count (https://en.wikipedia.org/wiki/Cardinality).

Because math. sigh

ecdrid · January 30, 2018, 7:47am

It can’t be made further simpler than this…

Thanks a lot…

There’s one more PDF which (should be read) after reading this…(had read it so just informing)

Convolution Mathematics.pdf (858.0 KB)

Thanks…

Both go hand in hand…

amritv · January 30, 2018, 8:09am

Hi Jeremy, reviewed on Android device, Note 8 and it rendered fine, no issues at all.
I can completely resonate with the need for a paper such as this as I have learnt more through ‘doing’ the code in your course as opposed to learning the underlying math as you mention in the paper. So it’s good to see the mathematical concepts, although some parts are new and some complicated for me, and will require a number of repeat reads as mentioned in the paper.
I haven’t come across a paper that talks about the mathematical notation and then links that to what the code looks like, not sure if that is entirely possible or needed but it would be great to reference mathematical notation to code with the aim of improving comprehension, maybe a side note in the resources section.

Thank you for sharing this.

radek · January 30, 2018, 10:41am

This is amazing beyond belief. It made it into Mendeley on my phone and it looks beautiful.

I seem to overdo it when I express my appreciation of things online. And then I happen to come across as a not very serious person.

But just wanted to say. OMG. This is amazing. Many moments will be spent perusing this in great detail

Brad_S · January 30, 2018, 1:05pm

the matrix calculus and bit leading into gradient descent looks good (I ended up skimming it though). before that I have a few comments:

errors:
in table under review scalar derivatives rule, quotient rule. I don’t follow your equation. The example isn’t right. d/dx(x/3) is 1/3. you show an x^2 equation… the quotient rule should be (fprime.g -f.gprime)/g^2. this gives (2x3x-3x^2)/9x^2 = 1/3. you’re not using it elsewhere, so maybe just remove it?
<change 1 to 2>: 1st equation under Matrix Calculus :… 2 x 1 = 1

ticky touchwood or cosmetic:
in table under review scalar derivatives rule, product rule, I’d show an explicit 1, to show d/dx(x), i.e. (x(^2)1+x2x
<insert “at”> which you can find Khan Academy differential calculus course.
in the jacobian section the 3 delta/deltax_i . f(x) matrices spacings make it hard to discern columns unless you knew they were there (it becomes obvious when they’re zeroed, so maybe doesnt matter)

Brad_S · January 30, 2018, 1:09pm

is |x| better than ms and ns? (I’ve always stuck with the latter as you see them in the bottom right of the matrices)

ecdrid · January 30, 2018, 1:10pm

Yep I too feel that there’s an error in the table

We need to use u/v rule there…

Vishucyrus · January 30, 2018, 1:25pm

I reviewed it on Chrome, Mozilla both on PC and Android. It’s great . No issues.
Also the writing is wonderful and quite informative .
Thanks @jeremy and @parrt for this great article .

reshama · January 30, 2018, 4:09pm

@jeremy
few minor comments on matrix calculus:

scalar derivative rules are a bit crammed. some spacing around would make it easier to read
have heard of “weight vector”, but what is an “edge” weight vector?
Beginners are often unsure of how to denote the dimensions of a matrix. Can it include that m x n is num of rows x number of columns?
is it possible to put this line earlier in document when bold notation is used: "Lowercase letters in bold font such as x are vectors and those in italics font like x are scalars"
explain upside down triangle is called Del (capital delta)

nav13n · January 30, 2018, 4:14pm

Thanks @jeremy! I reviewed it in chrome on MacBook. Fairly clear and insightful illustration!

parrt · January 30, 2018, 4:48pm

hiya. Is there an external link for that? I could link it in.

parrt · January 30, 2018, 4:54pm

thanks! fixed. pushing as I go…

servus · January 30, 2018, 6:03pm

I think there is a typo in the matrix just before “Welcome to matrix calculus!”.
The derivate of g(x,y) with respect to x is 2 not 1.