Meet Mish: New Activation function, possible successor to ReLU?

Not to drag the conversation too far backwards, but I think I got a nice result with EfficientNet-b3 + Mish + Ranger on Stanford Cars:

  • 93.8% 5-run, 40epoch, mean test set accuracy on Stanford Cars using EfficientNet-b3 + Mish + Ranger Beat the EfficientNet paper EfficientNet-b3 result by 0.2%. The EfficientNet author’s best result using b3 was 93.6% (their best EfficientNet result was with b7, 94.7%)
  • I found that with Ranger alone, EfficientNet was much more stable to train but I couldn’t match the accuracy from the paper, adding Mish then pushed it over the line :slight_smile:
  • At least for me, EfficientNet + Ranger (with and without Mish) preferred a shorter flat period that the XResNet Imagewoof work here (10% start_pct worked well for me) before the cosine anneal

Full details over here: [Project] Stanford-Cars with fastai v1

Notebook here

Thanks for the inspiring work on this thread all!

5 run accuracy, each run took about 3 hours so I didn’t have the patience to run it 10 times :expressionless:

meffnetb3_acc_std_dev

Validation set (=test set) accuracy seems super stable in the last few epochs though:

5 Likes