Meet Mish: New Activation function, possible successor to ReLU?

@Diganta and others!
I posted my results here: How we beat the 5 epoch ImageWoof leaderboard score - some new techniques to consider
I tested again activations and find bug in xresnet implementation!
My result - Mish vs Rely vs LeakeRelu is:
0.7412, std 0.011, 0.7572 std 0.0100, 0.7576 std 0.0058.
Point is, what default argument in kaiming_normal_ is: nonlinearity=‘leaky_relu’.
If you implement right numbers for Mish, i believe, results will be much better!

2 Likes