Meet Mish: New Activation function, possible successor to ReLU?

Quick Update: Mish on CSP-p7 detector is currently the SOTA on object detection on MS-COCO (test-dev). Additional details on paperswithcode.

2 Likes

Niiiice! quick one, have you had time to explore much how Mish performs in transformer architectures?

1 Like

Hi Morgan. No unfortunately, I haven’t myself played around with Transformers a lot so I haven’t tried Mish on the same yet. Would love to have someone to try it out though. Recently I have been seeing many nice results of using Mish in different tasks like 3d shape descriptors, volumetric occlusion prediction, scene flow, segmentation (credits to @muellerzr).
Currently, I am working on something more exciting and has showed much promise as compared to Mish. Let’s see how that goes.

5 Likes

wuuuut! Very excited for this. Will let you know if I manage to do some proper testing with Mish in Transformers

1 Like

Great, keep me posted.
I’ll post updates here as they come by.

2 Likes


This is our new work (not the one I mentioned in this thread earlier). Would love some feedback. @muellerzr @morgan @LessW2020 @ilovescience

6 Likes

Small personal update: I have now joined Weights & Biases full time as a Machine Learning Engineer. There’s a lot of exciting features up ahead in the pipeline and I hope to make WandB integration in fastAI more seamless than it is now.
Mish really brought me into this position so I wanna thank all of you for supporting along!!
Cheers!

12 Likes

Good luck then ^^ :slight_smile:

1 Like

Thank you!! :slightly_smiling_face:

Hey all.
Today (December 8, 5 pm PT) I’ll be speaking at the Weights & Biases Salon along with Maithra Raghu from Google Brain. My talk will be about Smooth Activations, Robustness and Catastrophic Forgetting where I will present a new hypothesis for lifelong learning. Maithra will talk about her paper “Do Wide and Deep Networks learn the same thing?”
RSVP on zoom
YouTube Live
Its my first time giving a talk with one of my inspirations in this field and I would really appreciate you all to hop in.
Thank you!

4 Likes

Sry I read too late to join :frowning: but I watched now afterward ^^

All the talks are recorded. Let me know if you have had any questions regarding my presentation once you go through it. Thanks!

2 Likes

Well, I went through your whole presentation today at dawn 02:50 (UTC+01:00 time) when I wrote my prev reply then I went to sleep (you know I’m in the EU) - that’s why I only replay again now ^^.

I don’t really have questions becasue I know your work in great detail since you wrote and published your paper on arxiv + I also checked your github repo right when the code was released there + I also follow your posts here on the fast.ai forum :slight_smile:
(So I’m also familiar with your Triplet Attention network for other reasons, ofc it’s offtopic - the video was about the Mish activation function, its properties and implications)

I can follow the explanations/conversations in the video, but I had to really concentrate what’s going on :smiley: (I read more easily your paper in my own tempo back then)
I only have 1 suggestion:
You can add youtube subtitles for the video and then easier to follow for those who not already know your work or not even know your main points. (it’s not me, but I think about other people)

Btw, keep up the good work! :wink:

1 Like

Hey, thanks for the appreciation and also your suggestion.
The talk was not primarily on Mish though, I wanted to connect the dots and present this new dilemma/ trilemma which can arise and can be potentially solved by smooth activations.


Yes I guess from next time onwards I’ll make subtitles available. Also just for everyone here, this is the link to my presentation slides.
This is the final video uploaded.

Also since GitHub made discussions now available to all public repos, please feel free to use the discussion forum on my repository to discuss on anything about Mish, Activations or Non-Linear Dynamics in general.

Discussion Forum Link

4 Likes


It would be awesome if you all consider to take part in this. :slight_smile:

1 Like

I hit 1k stars on my repo. It’s insane how much the project has grown. Also hit 100 citations on Google scholar (still don’t know why it doesn’t show the actual count which is 121)
Wanna sincerely thank everyone here for all that they have done to support me throughout this project, I feel blessed and honoured.


6 Likes

Well done :smiley: :smiley: :clap: :clap:

1 Like

Thank you! :sweat_smile:

Hey guys,

Seems like Mish is gaining some interest in the community of getting it added to core PyTorch just like SiLU and Hard Swish have been added. Link of the issue - https://github.com/pytorch/pytorch/issues/25584

Mish had been earlier added to different experimental branches on PyTorch by internal pytorch members.

It would be awesome if you believe it’s useful to get it added to PyTorch then leave a comment on that issue thread.

Thanks! :slight_smile:

4 Likes
3 Likes