Why non-linear function is the power source of neuralnet from a mathematical perspective?
Quote from the fastbook
Amazingly enough, it can be mathematically proven that this little function can solve any computable problem to an arbitrarily high level of accuracy, if you can find the right parameters for
w1
andw2
and if you make these matrices big enough. For any arbitrarily wiggly function, we can approximate it as a bunch of lines joined together; to make it closer to the wiggly function, we just have to use shorter lines. This is known as the universal approximation theorem . The three lines of code that we have here are known as layers . The first and third are known as linear layers , and the second line of code is known variously as a nonlinearity , or activation function .