Meet Mish: New Activation function, possible successor to ReLU?

With the massive help of @iyaja and Javier Ideami, we were finally able to obtain the loss landscapes of Mish, Swish and ReLU. Mish provides an overall lower loss, better accuracy, more well conditioned landscape which is highly smooth thus making optimization easier as compared to both Swish and ReLU.
Link to tweet
Javier also discussed this in-depth in his talk last night at the Synthetic Intelligence Forum, Montreal along many other great projects and studies in the loss landscape space. Youtube link
Here are the visualizations:


For the visualizations not labelled, its ReLU -> Mish -> Swish (from left to right)

12 Likes