Meet Mish: New Activation function, possible successor to ReLU?

morgan · September 7, 2019, 6:19pm

Not to drag the conversation too far backwards, but I think I got a nice result with EfficientNet-b3 + Mish + Ranger on Stanford Cars:

93.8% 5-run, 40epoch, mean test set accuracy on Stanford Cars using EfficientNet-b3 + Mish + Ranger Beat the EfficientNet paper EfficientNet-b3 result by 0.2%. The EfficientNet author’s best result using b3 was 93.6% (their best EfficientNet result was with b7, 94.7%)
I found that with Ranger alone, EfficientNet was much more stable to train but I couldn’t match the accuracy from the paper, adding Mish then pushed it over the line
At least for me, EfficientNet + Ranger (with and without Mish) preferred a shorter flat period that the XResNet Imagewoof work here (10% start_pct worked well for me) before the cosine anneal

Thanks for the inspiring work on this thread all!

5 run accuracy, each run took about 3 hours so I didn’t have the patience to run it 10 times

meffnetb3_acc_std_dev

Validation set (=test set) accuracy seems super stable in the last few epochs though: