After testing a lot of new activation functions this year, I’m excited to introduce you to one that has delivered in testing - Mish.
Per the paper, Mish outperformed ReLU by 1.67% in their testing (final accuracy) and was tested in 70 different architectures.
I tested against ImageWoof using XResNet50 and a variety of optimizers to try and put Mish through it’s paces and saw improved training curves and accuracy jumps of 1-3.6% merely by dropping in Mish instead of ReLU.
The overhead vs ReLU is minimal (+1 second per epoch) and so far well worth it for the accuracy gains.
I wrote a full article on Mish here:
and have a PyTorch/FastAI drop in (mish.py) and Mish XResNet here:
and here’s the paper link:
Please give Mish a try and see how it performs for you versus ReLU as I think you’ll see a nice win from it.