FYI, links to other ReLU discussions:
- ReLU unreasonably effective! Apparently the best answer was “Handout 3B.”
- Why can’t ReLU [over]fit my sine wave ? Has an insightful example/discussion.
- Lesson 4: ReLU? Layers? Where I first made this list.
- Are vanishing gradients good ? Or… why did this work at all?
- Google says Swish beats ReLU. (2017, so may not have held up?? But interesting.)