ReLU and its effectiveness


(Andrea de Luca) #1

We all know the advantages and disadvantages of ReLU with respect to other popular nonlinearities like sigmoid, tanh, etc.

What I struggle to understand is its effectivenes in allowing a MLP to approximate nonlinear functions and separating nonlinear regions.

Relu is, in the end, the most trivial linear function (x, which leaves its input untouched) glued with the constant function 0.
Restricting ourselves to a single neuron, it just leaves the result of a dot product as it is, or suppresses it altogether if it’s negative.

How can relu be a useful nonlinearity? After all, we know that a NN with just linear activations (even the most general mx+q with m,q varying for each layer or even each neuron) would not be capable of separating nonlinear regions (any composition of linear mappings, no matter how long, is just a linear mapping).

Thanks.


(Stephan Rasp) #2

http://neuralnetworksanddeeplearning.com/chap4.html

Maybe this will help you understand it. He uses a step function as a non-linear function but relu would work equally.


(marc) #3

You should play with

It will give you a good intuition of how the different activations work.


(Andrea de Luca) #4

Useful links. But I would have hoped for something a bit more theoretically grounded…


(Dennis Sakva) #5

Here you go:


A good theoretical paper that shows that neural networks are piecewise linear and because of this are susceptible to adversarial examples.


(Dennis Sakva) #6

And BTW places where sigmoid and tanh are highly nonlinear are associated with gradient explosion/vanishing gradients. So they are non-linear, but not quite.


(Andrea de Luca) #7

Thanks, I’m sure I’ll enjoy it. :slight_smile:


(Omar Amin) #8

https://fleuret.org/dlc/

check the lecture number 3 on MLP slide #7 you’ll find a visualization for how relu is able to approximate the nonlinearity

hope it helps.

thanks


(Andrea de Luca) #9

Didn’t know that course. Thanks, I think it will be interesting for other stuff too…

EDIT: Ok, found it on handout 3B, it is like I imagined it, but it provided duly justification.

Thanks! That answers my question!