I recently read the mobilenet v3 paper and they used an approximation for swish called hard-swish: x.Relu6(x+3)/6. I extended this to mish and found that x.Relu5(x+3)/5 also seems to be a good approximation for mish. I guess hard-mish would be a good name. Here’s the notebook with my workings.
It also made me think whether the following activation would be any good x.Relu4(x+3)/4:
I haven’t had time to experiment with neural nets, but maybe someone here can try them out.