Lesson 6 In-Class Discussion

This is well described here: http://cs231n.github.io/neural-networks-1/
Short answer: tanh output is zero-centered, it makes gradient descent process easier to converge.

7 Likes