Meet Mish: New Activation function, possible successor to ReLU?

Well, I went through your whole presentation today at dawn 02:50 (UTC+01:00 time) when I wrote my prev reply then I went to sleep (you know I’m in the EU) - that’s why I only replay again now ^^.

I don’t really have questions becasue I know your work in great detail since you wrote and published your paper on arxiv + I also checked your github repo right when the code was released there + I also follow your posts here on the fast.ai forum :slight_smile:
(So I’m also familiar with your Triplet Attention network for other reasons, ofc it’s offtopic - the video was about the Mish activation function, its properties and implications)

I can follow the explanations/conversations in the video, but I had to really concentrate what’s going on :smiley: (I read more easily your paper in my own tempo back then)
I only have 1 suggestion:
You can add youtube subtitles for the video and then easier to follow for those who not already know your work or not even know your main points. (it’s not me, but I think about other people)

Btw, keep up the good work! :wink:

1 Like

Hey, thanks for the appreciation and also your suggestion.
The talk was not primarily on Mish though, I wanted to connect the dots and present this new dilemma/ trilemma which can arise and can be potentially solved by smooth activations.


Yes I guess from next time onwards I’ll make subtitles available. Also just for everyone here, this is the link to my presentation slides.
This is the final video uploaded.

Also since GitHub made discussions now available to all public repos, please feel free to use the discussion forum on my repository to discuss on anything about Mish, Activations or Non-Linear Dynamics in general.

Discussion Forum Link

4 Likes


It would be awesome if you all consider to take part in this. :slight_smile:

1 Like

I hit 1k stars on my repo. It’s insane how much the project has grown. Also hit 100 citations on Google scholar (still don’t know why it doesn’t show the actual count which is 121)
Wanna sincerely thank everyone here for all that they have done to support me throughout this project, I feel blessed and honoured.


6 Likes

Well done :smiley: :smiley: :clap: :clap:

1 Like

Thank you! :sweat_smile:

Hey guys,

Seems like Mish is gaining some interest in the community of getting it added to core PyTorch just like SiLU and Hard Swish have been added. Link of the issue - https://github.com/pytorch/pytorch/issues/25584

Mish had been earlier added to different experimental branches on PyTorch by internal pytorch members.

It would be awesome if you believe it’s useful to get it added to PyTorch then leave a comment on that issue thread.

Thanks! :slight_smile:

4 Likes
3 Likes


20k views :dizzy_face:

2 Likes

Some exciting updates:
I will be soon releasing new benchmarks with Mish for Object Detection and Instance Segmentation models. It will be a whole suite based on MMDetection and powered by Weights & Biases.

4 Likes

First model in the series: Mask RCNN with a ResNet-50 + Mish. Links (includes log files and weights):
Results: https://github.com/digantamisra98/Mish/tree/master/PyTorch%20Benchmarks
Weights & Biases Dashboard: https://wandb.ai/diganta/Mish

@muellerzr it said I can’t reply to this thread in consecutive six times and it would require for someone to reply to my last one before I can create a new reply. Is this expected? If so, is there a workaround this rather than editing old threads? Thanks!

3 Likes

Second model in the series: Faster RCNN with a ResNet-50 + Mish. A whopping 1.3% AP boost over vanilla Faster RCNN. Links (include log files and weights):
Results: https://github.com/digantamisra98/Mish/blob/master/PyTorch%20Benchmarks/Readme.md#faster-rcnn
Weights & Biases Dashboard: https://wandb.ai/diganta/Mish?workspace=user-diganta
Per Epoch Performance Dashboard on WandB: https://wandb.ai/diganta/mmdetection-tools?workspace=user-diganta

Some results:

@muellerzr works now

[Update]

With the massive efforts of Javier Ideami, now, there is an interactive web visualizer of the loss landscapes of a ResNet with Mish, Swish and ReLU. Link.

4 Likes

In 2019, I started a small research group called as Landskape with the vision of fostering inter-disciplinary research into the “whys” and “hows” of deep neural networks. Today I’m so excited to launch our Twitter page for Landskape - https://twitter.com/LandskapeAI. We have and had researchers and students from UIUC, IIT-G, MILA, KAIST, HKUST, Imperial College with collaborators from Continual AI and Google with backgrounds in Math, Physics, Computer Science and even Design. As of now, we are working on projects in the domain of Super Resolution and Continual Learning. To stay updated with our research, consider following our Twitter channel or visit our page.

2 Likes

Heads up: Mish is now added to PyTorch and will be included in the 1.9 release. View the merged PR here.

8 Likes

I ran some benchmarks with the PyTorch native Mish implementation, other Mish implementations, and a few other activation functions. For the most part it’s quite good.

On a Tesla V100 native Mish was faster than native ReLU and for float16 was fastest and second fastest on forward and backward pass, respectively. On a Tesla P100 native Mish was faster or tied with all the other Mish implementations, including MishCuda, but lagged behind ReLU.

The only sore spot was CPU performance, where it was significantly slower than both a Torchscript version and raw PyTorch during the forward pass.

3 Likes

Interesting, the benchmarks link leads me to a 404. Can you share the correct link?

I copied an old link with the wrong date. Should be fixed now.

1 Like

This is one of the most informative blogposts I have seen in a while. Interesting to see that Mish is faster than native ReLU on 1.9 fwd and bwd passes for fp16.

2 Likes

My GitHub profile (digantamisra98 (Xa9aX ツ) · GitHub) was approved by GitHub sponsor program. It’s absolutely huge for me. With the current research and reproducibility experiments I’m doing on robustness, this will help me scale it up and remove barriers that currently exist in my research pipeline. If you would like to contribute to my research, head over to Sponsor @digantamisra98 on GitHub Sponsors · GitHub

3 Likes

New paper alert - [Paper] Anytime Progressive Pruning
Feedback welcomed :slight_smile:

1 Like