My understanding is with Deep Learning we are trying to build a math function that with data similar to our training data can predict the similar output.

all we always use is Y=aX function and to prevent the overall function from being pure linear, we use activations.

I am wondering why instead of a simple linear function we are not using more complex functions like Y=aX^2+bX or even sin function.

shouldn’t more complex functions get us to the more complex end result faster? or is it become unstable and very hard to train?

is there any research being done in this area or it just assumed that simpler functions work better?

I tried to use Y=aX^2+bX as a function with Keras last year and I tried it on MNIST but there wasn’t any visible advantage to it.

Is MNIST too simple to be able to detect any difference and is not good use case?

does it worth experimenting more and how do you recommend to approach it?

Deeper nets figure out such non-linear relationships automagically. They probably won’t create an exact quadratic features, but approximations are OK, and real-world data is noisy anyway.

There is an excellent post with visualizations about how nets create nonlinearities: http://colah.github.io/posts/2014-03-NN-Manifolds-Topology/

My point exactly, by using more complex functions as building blocks we might be able to reduce the layers needed. Reduce the training time or model size or other positive effects!

I don’t have neither the solid evidence nor experience to answer that question, but it seems like many simple building blocks work better than a few complex ones. Consider sigmoid vs. relu.