I didn't get the ReLU thing, why ReLU?

ctwardy · March 31, 2021, 7:59pm

FYI, links to other ReLU discussions:

ReLU unreasonably effective! Apparently the best answer was “Handout 3B.”
Why can’t ReLU [over]fit my sine wave ? Has an insightful example/discussion.
Lesson 4: ReLU? Layers? Where I first made this list.
Are vanishing gradients good ? Or… why did this work at all?
Google says Swish beats ReLU. (2017, so may not have held up?? But interesting.)