I can’t remember the name of the “infamous” ml paper with a few pages long math proof in the appendix.
I remember Jeremy mentioning it at the beginning of one of the first few lessons in a previous edition of the course.
Do you remember the title?
If you tell me what the topic of the paper is (what it’s about in broad terms), it’s easier to help.
(there are quite a few infamous ml papers out there :P)
I remember now — it was the SELU paper.
My 1. guesses were Szegedy’s batch norm paper or Leslie Smith’s learning rate paper or something like that - but they have no appendix, let alone several pages of math in the appendix
But if you say it’s an activation function that normalizes automatically, I would have linked to this paper.
(Of course I don’t know all papers, only 4000-5000 of them, but a little hint helps)