Meet Mish: New Activation function, possible successor to ReLU?


I’ll replicate the results of this paper today and I also had a talk with the author. Hopefully we can see some progress.

1 Like

Someone mentioned this earlier, about how useful it is to have these discussions here on the forum and I just want to reiterate that fact. It is very nice to have all of us here and having such a healthy environment discussing these complex ideas and yet bring it down in a way for the rest of the forum to understand (and myself a lot), and also make it comfortable enough to where no question feels stupid. Having this has definitely boosted my confidence in doing this type of work and discussing it. Thanks guys :slight_smile:

9 Likes

I’m also enjoying watching you folks developing deep expertise and turning it into great results! :smiley:

5 Likes

@Diganta I would have called that a math problem :stuck_out_tongue: … we can always approximate, we only need a number from a practitioners point of view (of course). BTW those 2 papers reminded me of: https://weightagnostic.github.io/ (paper: https://arxiv.org/abs/1906.04358 ). It has a WTF moment at the end, its one of those papers you will enjoy reading, if you havent already.

2 Likes

I am dedicated to solve it and obtain the EOC and ROC for Mish. But I had heard of this paper. Hadn’t read it though. I’ll surely read this. There’s just so much progress made in this domain in every passing day, sometimes it gets overwhelming. Haha. Again, that’s because maybe I’m still an undergraduate and unfazed of the whole research scenario and pace that I need to get acquainted to.

1 Like

@Diganta Don’t worry you’re not alone in this endeavor! I am as well :slight_smile:

1 Like

@LessW2020 there is a couple of comments (except mine) on your Meet Mish article on Medium which I think you should address.

I made it work. Replace in the constructor of Res2Net this

    self.avgpool = nn.AvgPool2d(7, stride=1)

with that

    self.avgpool = nn.AdaptiveAvgPool2d(1)
2 Likes

@Diganta sorry wrong paper, that one is good but the real gem is this one: “The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks” https://arxiv.org/abs/1803.03635

1 Like

Thanks a ton @grankin! Now we can test it out and see how/if it helps.
Much appreciated.

1 Like

@Redknight I have the array of sigma_b (standard deviation of bias Initializer) , sigma_w (standard deviation of weight Initializer) and the q (EOC phase plane boundary values), how do I plot the EOC curve?

I’m going to run some tabular tests today and tomorrow on the following datasets:

  • Adults -> Binary
  • Rossmann -> Regression
  • PUC-Rio -> Multi Class (non-binary)
  • Brazil Air Pollution -> Time Series

With the following setup:

  • Base
  • Flatten and Anneal
  • Mish
  • RangerLars and Ranger + LookAhead

Will fill in the results when they are done

1 Like

Hey Zach. Can you direct me to a repository where you are maintaining the logs or these results in general. Would be useful.

Sure :slight_smile: I keep a little side “play” repository here. The notebooks will get uploaded to their own folder soon, I’ll name it “Tabular Optimizer Experiments”

If I find anything substantial I’ll also make a separate repository

1 Like

ADULTs results are in and well… they surprised me to say the least. Essentially: Flatten + Ranger wins again, but Flatten + Ranger + Mish loses. I was surprised by this :frowning: Perhaps we need to rethink the implementation somehow @Diganta? I still need to run on the other datasets but so far so good!

Except when taken for longer epochs (10 instead of 5) the advantage vanishes.

1 Like

Hi @Diganta thanks for the notice. I have responded to all the comments including yours, and updated the article to add some of our results from here as well as the benchmark testing results.
And of course I now link to your github :slight_smile:

2 Likes

I’ll take a look. But will it be possible just for clarity for you to post the standard deviation of the results of the various runs you obtained?

1 Like

Thanks @LessW2020 :slight_smile:

Those should be included in the table :slight_smile:

See the readme in the folder. Otherwise I’ll post everything once it’s all done.

1 Like

@grankin @LessW2020 if you guys want to try Res2Net, I would suggest gavsn implementation !
I have been experimenting with Res2Net (unrelated to this topic) and the performance boost is indeed quite welcome. Also, if you need something more in the fashion of torchvision ResNet, you can check my personal modifications: https://github.com/frgfm/Holocron/blob/master/holocron/models/res2net.py (which I’m using for object detection extensively)

Glad to see that you are still exploring the next low hanging fruits to climb that leaderboard :wink:

Cheers

3 Likes