SWISH: Google researchers found new activation function to replace ReLU

bushaev · October 22, 2017, 2:04pm

Hi! Recently researchers from Google published a paper about new activation function they call Swish.
The formula is simply sigmoid(x) * x. The paper shows that it matches or outperforms ReLU activation function in nearly every experiment. Paper.

alexott · October 22, 2017, 2:11pm

Question is - how it will affect performance? ‘relu’ is much simpler in implementation, and although ‘swish’ is also simple, but it still require exponent function

machinethink · October 22, 2017, 2:11pm

Note that it actually looks a lot like ReLU.

neuralMax · October 22, 2017, 2:23pm

It also looks like inverted low pass filter with resonance. http://media.soundonsound.com/sos/oct99/images/synth13_14.gif

cold_fashioned · October 25, 2017, 8:04pm

Ha! Yeah, looks like 6db slope, too

pietz · October 28, 2017, 9:28am

I only skimmed the paper but it seems they never tested it without batch norm, correct?

bushaev · October 28, 2017, 9:42am

Here’s the quote from the paper

The success of Swish implies that the gradient preserving
property of ReLU (i.e., having a derivative of 1 when x > 0) may no longer be a distinct advantage
in modern architectures. In fact, we show in the experimental section that we can train deeper Swish
networks than ReLU networks when using BatchNorm (Ioffe & Szegedy, 2015).