"Be Careful What You Backpropagate" paper

Thanks for sharing the paper.

You may like thread here. It discusses a toy dataset that’s easily fitted using a tanh activation as opposed to the conventional choice of a relu.