Meet Mish: New Activation function, possible successor to ReLU?

Diganta · October 15, 2019, 11:25am

Thank you for the support. I don’t put myself to be up even within the PhD folks, I for sure know, they are more educated and knowledge-able than I am but yes I will keep working hard for sure.
He pointed out that the benchmarks / results present in the paper are Made Up which are not. I didn’t appreciate the point of view of discarding someone’s work without trying it out for yourself.
Yes, I do agree that it might not get through and is obviously not a one out of the blue paper but I’m constantly trying to improve it and I hope I can present it at a top conference.
The pre-prints that are out there on arXiv is just written on word, but I’ll start with LaTex (it’s been quite long since I used LaTex or OverLeaf) but I guess I need to pull up my sleeves and keep working on it, I’ve come way too far to back down now.
Thanks again!

Diganta · October 15, 2019, 11:33am

Hey Bjorn

Thank you for being here and presenting your views. I am working on getting a standard ResNet trained on ImageNet and will upload the trained network weights for the same. It will take some time though.

ducha-aiki · October 15, 2019, 11:36am

Oh, I didn`t realise that they meant the test numbers to be “made-up”, I was not able to login to the slack.
That is totally inappropriate offence, unless they can prove it, which is obviously they cannot.

Bjorn · October 15, 2019, 11:40am

How long does it take to train a full Resnet? I have only trained a CIFAR10 on Mish and that took maybe 30 min. But if you want to do the full 1000 with high res images I imagen it will take much longer.

(And I trained on a TPU see earlier notebook)

Diganta · October 15, 2019, 11:42am

@Bjorn I’m doing a MobileNet v2 on ImageNet this week most probably but it’s gonna take some time for sure. Regarding CIFAR, I don’t think it’s necessary to have a pre-trained network on CIFAR-correct me if I’m wrong.

Bjorn · October 15, 2019, 11:46am

I Agree you dont need to have it pretrained since it is so small. That was only for context on how long it takes to train.

mrfabulous1 · October 15, 2019, 1:32pm

Hi Jeremy, You should never think about giving up because of the the negative people. I can’t begin to express how much joy and happiness fast.ai has bought many people. You just have to look at how stoked some people are when the get their teddy bear classifier working or share their work.

Keep up the good work.
mrfabulous1

harikrishnanrajeev · October 15, 2019, 2:08pm

when i feel low, i watch one of your classes and it sets things right. Please don’t ever think of giving up.

muellerzr · October 15, 2019, 2:32pm

@Diganta, as a fellow undergraduate I know that all those words can have a profound affect on us. It’s disheartening and discouraging especially at the best of times. Just keep pushing on. The work you are doing is absolutely fantastic and the community here is only helping you go further keep doing what you’re doing and you’ll do just fine!

Diganta · October 15, 2019, 4:13pm

Thank you for the motivation and support. Definitely will keep up or increase my pace and work more hard.

Even · October 15, 2019, 4:41pm

Being affected by disrespectful criticism is a very human reaction, and one that I think everyone has, so there’s no need to apologize. The work you’re doing (as an undergrad no less!) is brilliant, and if you keep at it you’ll be miles ahead of those doing the critiquing in no time.

Perseverance is the most critical skill in almost any field. One of the the things I love about the fastai community is that it’s built around a foundation of encouragement and experimentation. The number of times I’ve seen ‘why don’t you try it and report back’ on the forums and the amazing follow through of the community here is part of why I’m proud to call myself a fastai member.

Criticism can be helpful if it’s constructive and helps drive the improvement of the solution. But discouragement and putting people down have no place in this community. And encouragement can go a long way towards helping someone achieve their vision.

So thank you for the work you do and keep up the good work!

jamesrequa · October 15, 2019, 5:12pm

@Diganta I wanted to personally thank you for contributions to ML research. Speaking from my first-hand experience using Mish as a drop-in replacement for ReLU in CNN’s applied to medical datasets, I have found that it almost always improves my results.

Keep your head up and keep innovating!

oguiza · October 15, 2019, 5:17pm

I’d like to join the others, and also thank you for your work to develop Mish. I have also used it in many time series datasets, and usually get some performance increase!
When you assume you won’t be able to satisfy everyone regardless of what you do, your life becomes easier.
Keep up with your great work! You are making great contributions to this DL community!

nbharatula · October 15, 2019, 6:31pm

@jeremy, thank you for sharing this. I would never have imagined you facing this or feeling this way. Much respect for your reflections above on toxicity and for finding the energy to keep giving and creating the way you do.

Diganta · October 15, 2019, 7:29pm

@Even @jamesrequa @oguiza Thank you for the support. Means a lot. I will continue with my work as I always have been.

abhi1thakur · October 15, 2019, 8:07pm

Hi all,
I would like to apologize to @Diganta and other community members for using the harsh words that I did on a thread in a slack community. I had no intentions for my hurtful behavior and i should have used better words to convey what I was trying to.

I have always appreciated independent researchers and their contribution to the community, These days, it has become difficult to trust research papers.

Using the mish activation, i created a kernel where i show how by just changing activation for two dense layers to mish, an improvement of roc auc is noticed: https://www.kaggle.com/abhishek/entity-embeddings-to-handle-categories-using-mish (probably the first public kaggle kernel using mish).

Once again, I would like to apologize for hurting the sentiments of the author and everyone else involved. I am not and had no intention of being toxic. I wish Diganta best of luck and I hope this becomes a strong foundation for his future in the field of machine learning.

Bjorn · October 16, 2019, 11:46am

@Diganta

I have used Mish and I am currently 11th place out of 400 on the Kannada Mish Kaggle competition. So Mish is a really great function to have in the network.

muellerzr · October 23, 2019, 4:22pm

For anyone using v2, I’m working on getting fit_fc in the library. for now, here is some source code to use (I’ll update when it’s in there)

#export
def FlatCosAnnealScheduler(self:Learner, lr:float=4e-3, tot_epochs:int=1, start_pct:float=0.72,
                           curve='cosine'):
  n = len(self.dbunch.train_dl)
  anneal_start = int(n * tot_epochs * start_pct)
  batch_finish = ((n * tot_epochs) - anneal_start)
  if curve=="cosine":        curve_sched=SchedCos(lr, 0)
  elif curve=="linear":      curve_sched=SchedLin(lr, 0)
  elif curve=="exponential": curve_sched=SchedExp(lr, 0)
  else: raiseValueError(f"annealing type not supported {curve}")
  scheds = [SchedNo(lr, start_pct), curve_sched]
  scheds = {'lr': combine_scheds([start_pct, 1-start_pct], scheds)}
  return scheds

def fit_fc(learn, tot_epochs:int=1, lr:float=1e-2, start_pct:float=0.72):
    "Fit a model with Flat Cosine Annealing"
    max_lr = learn.lr
    callbacks = ParamScheduler(FlatCosAnnealScheduler(learn, lr, start_pct=start_pct, tot_epochs=tot_epochs))
    learn.fit(tot_epochs, max_lr, cbs=callbacks)

To use:

fit_fc(learn, lr, 5)

Diganta · October 24, 2019, 9:29am

Awesome, congrats. Thanks for the appreciation!

muellerzr · October 25, 2019, 5:44pm

@LessW2020 do you think you’ll have some time to try to convert Ranger (or your new Ranger) over to 2.0? There’s some optimization issues I’m working on debugging. Worst case I’ll try to finish it this weekend