SWISH: Google researchers found new activation function to replace ReLU

Hi! Recently researchers from Google published a paper about new activation function they call Swish.
The formula is simply sigmoid(x) * x. The paper shows that it matches or outperforms ReLU activation function in nearly every experiment. Paper.


Question is - how it will affect performance? ‘relu’ is much simpler in implementation, and although ‘swish’ is also simple, but it still require exponent function

Note that it actually looks a lot like ReLU.


It also looks like inverted low pass filter with resonance. http://media.soundonsound.com/sos/oct99/images/synth13_14.gif

1 Like

Ha! Yeah, looks like 6db slope, too :grin: :+1:

I only skimmed the paper but it seems they never tested it without batch norm, correct?

Here’s the quote from the paper

The success of Swish implies that the gradient preserving
property of ReLU (i.e., having a derivative of 1 when x > 0) may no longer be a distinct advantage
in modern architectures. In fact, we show in the experimental section that we can train deeper Swish
networks than ReLU networks when using BatchNorm (Ioffe & Szegedy, 2015).