I recently read the mobilenet v3 paper and they used an approximation for swish called hard-swish: x.Relu6(x+3)/6. I extended this to mish and found that x.Relu5(x+3)/5 also seems to be a good approximation for mish. I guess hard-mish would be a good name. Here’s the notebook with my workings.

Can you shift the minima to -1? And also for the unnamed function curve (marked as yellow) in the 2nd figure, I see a discontinuity at x = 1. Based on intuition we want the function curve to be as continuous as possible, can the function be made purely smooth in the positive domain like for instance just have the effect on yellow line and keep the positive domain as ReLU?
For instance if f(x) = yellow line, then maybe we can write it as:

Thanks. I think the closest to those comments is x.Relu3(x+3)/3. It’s pretty smooth and goes further towards -1. All of the other functions (including hard-swish) actually have a slight discontinuity where the Relu part is equal to the divisor. The equivalent problem area is x=0 for the Relu3 function, but it doesn’t seem to have a discontinuity.

Yes, it is, however with CUDA optimization, we were able to get it faster and more closer to ReLU. You can find the CUDA optimized version of Mish here - https://github.com/thomasbrandon/mish-cuda